Releases: datahub-project/datahub
DataHub v0.9.1
Release Highlights
Known Issues
- In embedded search experiences (Glossary Terms, Domains, Lineage), filters can become "locked" in place once selected. This is addressed in v0.9.2
User Experience
- Column-level Impact Analysis is here! You can now see the full end-to-end list of column dependencies; watch the demo here
- When creating a Glossary Term from the UI, you can now add the description in the same step
- We now support adding Domains to Glossary Terms
- You can now preview Entity Names and Types in browser tabs
- Login with SSO button on the login page.
Bug Fixes
- Assertions Tab functionality is restored
- SSO: Continuous login loop bug reported when the session cookie size exceed 4096 characters has been address.
- Ingestion scheduler for > 30 ingestion sources is now fixed. Previously there was a bug causing certain ingestion to become unscheduled.
Metadata Ingestion
- New Ingestion Source: Databricks Unity Catalog - check out the docs here
- Tableau: Column-level lineage and Stateful Ingestion are now supported
- LookML: Improved column-level lineage
- BigQuery: we have promoted
bigqery-beta
tobigquery
- Snowflake: Stateful Ingestion now supports deleting Containers
DataHub Docs Site
We continue to push improved feature guides to the DataHub docs site, including:
- Sync Status
- DataHub Roles
- Dataset Usage and Query History
- DataHub Access Policies
- Managed DataHub Metadata Tests
What's Changed
- feat(ui): looker, lookml - add banner to cross-link ingestion by @Ankit-Keshari-Vituity in #6111
- feat(ingest): infer aspect name from type in get_aspect by @hsheth2 in #6033
- feat(ingestion): Tableau stateful ingestion by @amanda-her in #6094
- feat(ingest): include raw s3 paths if s3 source by @hsheth2 in #6168
- feat(secrets) Allow creating secrets with multiline values in the UI by @chriscollins3456 in #6169
- feat(ingest/tableau): support dashboard tags by @hsheth2 in #6185
- feat(ingest): bigquery-beta - Parsing view ddl definition for lineage by @treff7es in #6187
- fix(ingest) - bigquery-beta - Using table ref instead of table id by @treff7es in #6193
- docs(roles): update roles docs to new doc format by @aditya-radhakrishnan in #6175
- docs(posts): add posts feature guide by @aditya-radhakrishnan in #6184
- feat(ingest): include instance in container dataPlatform when provided by @hsheth2 in #6083
- feat(telemetry): add telemetry events to the settings page by @aditya-radhakrishnan in #6198
- Worked to update the ingestion type while editing by @Ankit-Keshari-Vituity in #6156
- fix(ingest): add lower bound for ujson dep version by @hsheth2 in #6189
- feat(ingest/tableau): emit status aspects + streamline stateful ingestion by @hsheth2 in #6188
- feat(ingest): support self-signed certs in Tableau by @hsheth2 in #6172
- fix(ingest): report warning/error counts correctly by @hsheth2 in #6128
- fix(ingest): Closeable as a context manager by @hsheth2 in #6067
- feat(ingestion-ui) Add new form for the bigquery-beta connector by @chriscollins3456 in #6200
- feat(ingest): add platform instance to tableau by @alaponin in #5978
- feat(release): bump CLI version to 0.9.0 by @szalai1 in #6195
- fix(frontend): fix UI message in create group modal by @liyuhui666 in #6205
- docs: dataset usage and query history feature guide by @treff7es in #5900
- fix(glossary) Improve business glossary loading performance by @chriscollins3456 in #6208
- feat(ingest): replace base85's pickle with json by @hsheth2 in #6178
- docs: add sync status feature guide by @hsheth2 in #5897
- feat(frontend): add custom ssl truststore settings by @alexey-kravtsov in #6090
- docs(spark): add configuration instructions for databricks by @mayurinehate in #6206
- fix(ingest): use corpGroup instead of corpgroup by @hsheth2 in #6202
- build: upgrade gradle wrapper by @hsheth2 in #6203
- fix(ingest): catch errors when profiling for sample values by @mayurinehate in #6194
- fix(ingest): only restrict GE version for hive by @hsheth2 in #6170
- feat(ingest/GE): enable debug logs to stdout when DATAHUB_DEBUG env var is set by @mayurinehate in #6192
- feat(ingest): allow selfsigned certificate in s3 source by @mayurinehate in #6179
- build(ingest): remove markupsafe dep and bump pytest-docker by @hsheth2 in #6201
- docs(access policies): Creating Proper Access Policies Guide by @jjoyce0510 in #6001
- feat(ingest): support deletion of containers in snowflake stateful in… by @mayurinehate in #6180
- fix(glossary) Improve performance when getting root glossary terms by @chriscollins3456 in #6214
- fix(ui) Fix bigquery and redshift forms for lineage fields by @chriscollins3456 in #6215
- fix(ui) Properly display column-level lineage with v2 field paths by @chriscollins3456 in #6217
- fix(ingest): bigquery-beta - Add stacktrace to bigquery schema ingest logs by @treff7es in #6226
- tests(embedded search): adding domain & container tests by @gabe-lyons in #6221
- fix(docs): fix pdl link for mxe docs by @aditya-radhakrishnan in #6230
- feat(telemetry): add telemetry events to the glossary, domains, and managed ingestion pages by @aditya-radhakrishnan in #6216
- fix(ingest): bigquery-beta - Adding python 3.8 fix for memory footprint util by @treff7es in #6228
- docs(quickstart): enable slack community link by @jx2lee in #6209
- fix(build): allow image tag via env, fix requirements by @anshbansal in #6237
- fix(ingest): remove back-ticks from table name when creating urn by @mayurinehate in #6236
- feat(ingest): bigquery-beta - Add option to lowercase urns by @treff7es in #6240
- fix(ingest): presto-on-hive - Adding db name to the presto on hive urn by @treff7es in #6024
- Worked on the CSS issue of Add Owners Modal by @Ankit-Keshari-Vituity in #6223
- fix(ingest): stateful-ingestion - keep dataset urn case in checkpoints by @treff7es in #6244
- Create Tag Modal Issue: Clear the input value on press. by @Ankit-Keshari-Vituity in #6212
- feat(build): add cypress tests for glossary and deprecation by @anshbansal in #6249
- feat(ingest): hive-on-presto - Add option to properly filter hive schemas by @treff7es in #6247
- fix(ingest):lookml - better column-level linea...
DataHub v0.9.0
Release Highlights
Known Issues
Assertions Tab UX bug
This release introduced a bug in the assertions tab causing assertion results to be hidden. This will be addressed in the subsequent release.
Release Notes
We’re excited to announce the release of DataHub v0.9.0!
This minor release includes an upgrade to Java 11 and surfacing Column-Level Lineage support within the DataHub UI.
Here are some additional highlights:
User Experience
- Column-Level Lineage is now surfaced within the DataHub UI!
- Advanced Search now supports searching by Column-level details (i.e. name, description, tag, etc.), as well as complex AND/OR statements. For example:
- Show results that match any filters
- Show results that match all filters
- Owner is either of Shannon or Mark
- Oner is not Shannon nor Mark
- Try it in demo here
- You can now add invite users and assign them to a default DataHub Role
- Improvements to site performance during the Browse experience
Developer Experience
- DataHub has been upgraded to Java 11!
- Improved tracking of GraphQL errors for bug resolution
- CorpUser and CorpGroup are now available via the Python SDK
Metadata Ingestion
- Automatically extract Column-Level Lineage from Snowflake & Looker sources
- dbt Meta Mapping is now supported at the Column Level - this means you can automatically extract Tags and Glossary Terms from your dbt model and surface them in DataHub
What's Changed
- fix(ingest): bigquery-beta - Getting datasets with biquery client by @treff7es in #6039
- feat(roles): add ability to invite users into a role by @aditya-radhakrishnan in #6015
- refactor(java11) - convert most modules to java 11 by @leifker in #5836
- docs(readme): Fixing broken article link by @davrax in #6042
- refactor(ingest): streamline pydantic configs by @hsheth2 in #6011
- docs(ingest): add example of dbt column_meta_mapping by @hsheth2 in #6038
- refactor(ingest): use aspect map in transformers by @hsheth2 in #6040
- feat(ui): Adding placeholder entity for DataPlatform by @jjoyce0510 in #6045
- feat(ingest): implement compression for CheckpointState by @alexey-kravtsov in #6007
- feat(advanced-search): adding select value modal by @gabe-lyons in #6026
- fix(ingest): bigquery-beta - Additional fixes for Bigquery beta by @treff7es in #6051
- feat(advanced search): adding advanced search filter component & prereqs for it by @gabe-lyons in #6055
- docs(ingest): add path spec examples for s3 by @mayurinehate in #6050
- fix(deps): metadata-io - remove parquet dependency by @shirshanka in #6046
- fix(ingestion): Tableau test case execution fix by @mohdsiddique in #6005
- feat(ingest): list referenced env variables in recipe by @hsheth2 in #6043
- fix(ingest): compat with mypy 0.981 by @hsheth2 in #6056
- fix(elasticsearch_index): create datahub_usage_event index where
datahub_analytics_enabled
set tofalse
by @GyuhoonK in #5974 - docs(approval workflows): adding approval workflow docs by @gabe-lyons in #5896
- feat(retention): disable applying retention on bootstrap by @anshbansal in #6066
- fix(ingest): correct tableau browse paths by @hsheth2 in #6064
- fix(ingest): bigquery-beta - handling complex types properly by @treff7es in #6062
- docs: create SECURITY.md by @laulpogan in #6069
- fix(containers): show soft deleted status of containers by @gabe-lyons in #6072
- docs(ingest): clarify bigquery-beta multiproject setup by @hsheth2 in #6071
- chore(setup): change defaults for partitions by @anshbansal in #6074
- refactor(browse): Improving Browse Feature Performance by @jjoyce0510 in #6073
- feat(ingest): add column-level lineage support for snowflake by @mayurinehate in #6034
- feat(ingest): looker - support for simple column level lineage by @shirshanka in #6084
- fix(elastic-setup) Fixing env var logic by @pedro93 in #6079
- Revert "chore(setup): change defaults for partitions (#6074)" by @pedro93 in #6086
- fix(mae-consumer): fix regression on base64 encoding by @codesorcery in #6061
- fix(elasticsearch) Analytics indices creation on AWS ES by @tomas-kubin in #5502
- docs(ingest): note that Athena doesn't support lineage by @hsheth2 in #6081
- fix(ingest): alias for mssql-odbc source by @hsheth2 in #6080
- fix(ingest): presto-on-hive - Setting display name properly by @treff7es in #6065
- fix(schema filter): fix schema infinite rerender by @gabe-lyons in #6082
- feat(monitoring): track graphql errors in metrics by @szalai1 in #6087
- feat(advanced search): Add component to show all advanced search filters & add new filter by @gabe-lyons in #6058
- fix(ingest): bump
lkml
version by @hsheth2 in #6091 - fix(ingest): lookml - extract column correctly by @shirshanka in #6093
- feat(retention): change default policy, add API to apply retention by @anshbansal in #6088
- fix(lineage): fix missed casing in lineage registry by @gabe-lyons in #6078
- fix(ingest): bigquery-beta - Lowering a bit memory footprint of bigquery usage by @treff7es in #6095
- feat(ingest): remove hardcoded env variable default for cli version by @shirshanka in #6075
- docs: add information about mapping ports for datahub-gms by @shirshanka in #6092
- chore(deps): upgrade graphql-java deps to 19.0 by @shirshanka in #6099
- chore(deps): upgrade neo4j to 4.4.x by @shirshanka in #6101
- feat(docs): Improve documentation about Search by @szalai1 in #5889
- feat(ingest): add async option to ingest proposal endpoint by @RyanHolstien in #6097
- chore(deps): upgrade opentelemetry dependencies by @shirshanka in #6100
- refactor(recommendations): Bump default max recommendations count for Platforms by @jjoyce0510 in #6113
- feat(ingest): add Sandbox support by @rgudic in #6105
- fix(mae): use JAVA_TOOL_OPTIONS instead of JDK_JAVA_OPTIONS by @szalai1 in #6114
- feat(advanced-search): Complete Advanced Search: backend changes & tying UI together by @gabe-lyons in #6068
- feat(search): improved search snippet FE logic by @gabe-lyons in #6109
- feat(ingest): add CorpUser and CorpGroup to the Python SDK by @ttaubermarshall-stripe in #5930
- fix(ingest): hide deprecated path_spec option from config by @hsheth2 in #5944
- feat(posts): add posts feature to DataHub by @aditya-radhakrishnan in #6110
- fix(ingest): remove unused mysql golden file by @hsheth2 in #6106
- fix(ingestion): fix percent change computation in stale_entity_removal by @rslanka in #6121
- refactor(ingest): use pydantic utilities for NamingPattern by @hsheth2 in #6013
- fix(ingest): presto-on-hive - not failing on Hive type parsing error by @treff7es in #6118
- fix(ingest): ignore usage and operation for snowflake datasets withou… by @mayurinehate in https://github.com...
DataHub v0.8.45
Release Highlights
User Experience
- Allow Term Groups to be the target of permissions
- Customize browser favicon via
REACT_APP_FAVICON_URL
param - Some UX improvements for charts & dashboards entity pages to reduce confusion
- Performance improvements on the lineage visualization
- Search bar for dataset schema tab
Developer Experience
- Add rest endpoint for restoring indices of a single entity (/aspects?action=restoreIndices)
- Create new platform instances via CLI
- Improved impact analysis performance due to an added caching layer
- Support for Patch as seen in August 2022 town hall.
Metadata Ingestion
- Introduces bigquery-beta source
- Looker source memory usage dramatically reduced
- Report memory usage during ingestion
- Improve Tableau lineage
- Usage statistics for Tableau
- LookML can automatically clone your Git repository. LookML is now supported in UI-based ingestion.
- dbt supports column-level meta mappings
- Support for deletion & rollback of time series data
- Upgrade to browse path forms
[see next page for list of commits]
What's Changed
- fix(privileges) Add Term Groups as targetable entities for privileges by @chriscollins3456 in #5806
- fix(javadocs): remove ampersand from pdl causing issue in doc generation for openapi by @RyanHolstien in #5808
- chore(ingest): remove archived docs by @hsheth2 in #5793
- feat(ingest): add rewrite option for metadata file check by @hsheth2 in #5763
- feat(cli): add support for sampled reporting to keep logs manageable by @shirshanka in #5800
- docs(refactor): Refactor Tags Feature Guide by @maggiehays in #5781
- docs(feature-guide) Impact Analysis by @maggiehays in #5765
- feat(theming): set custom favicon via env var by @gabe-lyons in #5810
- test(smoke-test): check debug arg in executor requests by @hsheth2 in #5811
- fix(ingest): bigquery-beta - Fixing dependencies by @treff7es in #5814
- feat(ingest): looker - reduce memory requirements by @shirshanka in #5815
- feat(restore-indices): add endpoint for restore indices, add basic check for graph by @anshbansal in #5805
- fix(frontend): download node only when USE_SYSTEM_NODE is set to false by @szalai1 in #5817
- doc: Make Airflow link clickable by @daha in #5803
- feat(ingest):looker - reduce mem usage, misc reporting improvements by @shirshanka in #5823
- feat(model, ingest): populate sizeInBytes in snowflake, fall back to table level profiling for large tables by @mayurinehate in #5774
- chore(docker): make curl/wget commands quiet in docker by @hsheth2 in #5819
- chore: cleanup references to the old ember app by @hsheth2 in #5797
- fix(ingest): spark-lineage: Adding additional debug logs to spark lineage by @treff7es in #5772
- fix(docker): add missing port mappings for non-neo4j quickstart by @hsheth2 in #5799
- fix(ingest): looker - report dashboard scanning correctly by @shirshanka in #5829
- feat(cli): report memory usage during ingest by @shirshanka in #5828
- fix(ingest): presto-on-hive - Fixing mysql filter by @treff7es in #5825
- docs(big query): add needed delete permission to list by @maaaikoool in #5826
- chore(ingest): set isort combine_as_imports by @hsheth2 in #5820
- fix(ingest): use
AwsConnectionConfig
instead ofAwsSourceConfig
by @hsheth2 in #5813 - feat(ingest): looker test connection by @hsheth2 in #5768
- feat(ingest): improve tableau lineage, workbooks query, fix pagination by @mayurinehate in #5756
- fix(ingest): profiling - memory usage reduction by @shirshanka in #5830
- feat(monitoring): enable JMX and OTEL for frontend pods by @szalai1 in #5834
- fix(standalone-consumers): Exclude Solr from spring boot application config & make them run on M1 by @pedro93 in #5827
- feat(hooks): Add toggle for enabling/disabling platform event hook by @pedro93 in #5840
- feat(transformers): Add semantics & transform_aspect support in transformers by @mohdsiddique in #5514
- feat(ci): auto label PRs by @anshbansal in #5839
- feat(inputs): improving clarity on inputs for dashboards by @gabe-lyons in #5841
- feat(ingest): add utility for converting MCEs to MCPs by @hsheth2 in #5812
- chore(smoke): add additional log in smoke test by @hsheth2 in #5842
- fix(ingest): fix doc generation import ordering issue with postgres by @hsheth2 in #5846
- feat(docker) Adds Sasl support to base ingestion image by @pedro93 in #5855
- fix(graphql) Fix null pointer exception when fetching entity aspect via graphql by @chriscollins3456 in #5857
- fix(ingest): reporting should work with timestamps by @shirshanka in #5860
- fix(patch-entity-registry): Remove exception for entities with key aspects. by @pghazanfari in #5831
- fix(browse): Fixing browse path to remove requirement for simple name suffix by @jjoyce0510 in #5634
- fix(ingest): bigquery - Fixing sharded regexp pattern config by @treff7es in #5861
- perf(elastic search graph service): improving perf of lineage query by @gabe-lyons in #5858
- chore(ingest): remove outdated GE compatibility hack by @hsheth2 in #5862
- ci(ingest): test with python 3.10 by @hsheth2 in #5863
- docs: improve doc generation, add better docs for snowflake, looker by @shirshanka in #5867
- feat(ci): tweak auto-label globs by @anshbansal in #5849
- fix(m1): preflight works with brew postgres@14 by @shirshanka in #5868
- feat(smoke-tests) Make smoke tests use standalone consumers by @pedro93 in #5856
- fix(domains): adding 10,000+ text when domain list caps out elastic count capacity by @gabe-lyons in #5838
- docs(notifications): slack notification docs by @anshbansal in #5871
- feat(docker): Update Dockerfiles to use java 11 runtime by @pedro93 in #5853
- Scroll issue on Glossary related entity page by @Ankit-Keshari-Vituity in #5804
- fix(ingest): include urns in rest sink failure logs by @hsheth2 in #5848
- fix(docker): Bumps JRE 11 to latest by @pedro93 in #5875
- feat(ingest): support reading config file from stdin by @hsheth2 in #5847
- fix(ingest): remove dbt
delete_tests_as_datasets
option by @hsheth2 in #5865 - fix(ingest): avrogen handling for missing fields with default values by @hsheth2 in #5844
- refactor(ingest): add ALL_ENV_TYPES constant by @hsheth2 in #5866
- feat(cli) Make docker compose quiet by @pedro93 in #5869
- feat(datahub-protobuf): add support for shadow jar, publish by @shirshanka in #5882
- feat(jars): better jar versioning for datahub-client, spark-lineage and protobuf by @shirshanka in #5883
- fix(dev-docker): set right context for frontend dev build by @szalai1 in #5885
- fix(ci): fix jar release action dependencies by @shirshanka in #5884
- feat(schema) Add search filter to Schema tab by @chriscollins3456 in #5845
- feat(ui) Add ...
DataHub v0.8.44
Release Highlights
Known Issues
Standalone Kafka Consumers
We have identified that using standalone Kafka consumers (MCP/MCL messages) has been a broken feature since v0.8.44. Root cause is some spring bean dependencies that were not correctly excluded.
This has gone undetected in our testing infrastructure because our tests do not run with standalone consumers since then until recently.
The underlying issue has been fixed by #5827 and we are now running all our smoke tests with standalone consumers, since #5856 to prevent this from happening in the future. The fix will be released in v0.8.46.
[Helm] DataHub Actions Container
We recently rolled out support for running ingestion in debug mode. This requires a bump in the datahub-actions
container to either HEAD (latest) or v0.0.7
. The correct version is set correctly as the default in v0.2.103.
User Experience
- Improvements to UI-based ingestion: view live logs during execution, view ingestion summary (ie. number of entities ingested), and rollback functionality. Also surfaces CLI-run ingestion jobs.
- New look on Homepage: Domains have been promoted to the top of the fold, so they are listed above Entity cards and Platform cards
- Improvements to searching for Looker resources - when searching for a measure or dimension, we will now surface Looks & Dashboards that reference those fields
- The DataHub Docs Site has a new look! We are reorganizing content to make it easier and more intuitive for DataHub Developers and End-Users alike to navigate our resources.
- Improved Error Handling on the UI - a much nicer messaging when exceptions are caught by the frontend application.
- Misc minor bug fixes and improvements
Developer Experience
- Eternal personal access tokens are now supported
- Deprecated support for Python 3.6 (we expect this to have little-to-no impact on the Community based on pip download data)
Metadata Ingestion
- Improved documentation for Domains transformer
- Stateful Ingestion now supported for Glue
data-lake
Source has been deprecated in favor ofs3
source- Chart Entity now supports chartUsageStatistics
- dbt ingestion supports auto-extracting owner from the
meta
block - Improved Snowflake Connector is now available; we expect this to provide a reduction in ingestion run-time and lower levels of complexity
What's Changed
- chore(ingest): remove orderedset dependency by @hsheth2 in #5591
- refactor(ingest): simplify upgrade version stats by @hsheth2 in #5588
- feat(metadata-service-auth): add support for eternal personal access tokens by @ksrinath in #5433
- fix(ci): paths for github workflows by @anshbansal in #5595
- fix(ingest): Fix ingest Clickhouse without password by @liyuhui666 in #5511
- fix(ci): cleanup sleeps to instead use retries by @anshbansal in #5597
- Kafka form Addition and resolved confilict by @Ankit-Keshari-Vituity in #5598
- fix(ingest): Fix minor logging bug in the glue source. by @rslanka in #5605
- fix(ci): use different image for smoke base image by @anshbansal in #5607
- fix(ci): cancel docker-unified workflow only on PRs on new commits by @anshbansal in #5608
- fix(ci): add env variable for creds smoke test by @anshbansal in #5609
- fix(ui) Followups to recent changes to UI ingestion forms by @chriscollins3456 in #5602
- docs(transformers): Add domain transformer documentation in transformers readme by @mohdsiddique in #5606
- feat(model): adding status aspect to assertions by @shirshanka in #5612
- fix(ingest): use default telemetry ID when config is unwritable by @hsheth2 in #5614
- chore(ingest): drop python 3.6 support by @hsheth2 in #5521
- fix(ui): Split based on Data Platform delimiter in Lineage viz by @jjoyce0510 in #5613
- feat(search): Sticky search filters + misc bug fixes & improvements by @jjoyce0510 in #5601
- fix(graphql): handle null source values in ml features & primary keys by @gabe-lyons in #5626
- fix(graph service): only query for entities that should have lineage [Breaking Change] by @gabe-lyons in #5539
- feat(model): Add optional message field to auditstamp by @gabe-lyons in #5611
- fix(ingest): fix indenting issue in azure ad connector by @aditya-radhakrishnan in #5627
- feat(tokens) Create and display non-expiring tokens on the frontend by @chriscollins3456 in #5630
- Schema tab: Fixed the header issue by @Ankit-Keshari-Vituity in #5622
- build(docs-website): only show release notes for recent releases by @hsheth2 in #5621
- docs(README): update links and reorg content by @maggiehays in #5618
- perf(operations): performance improvement to operations tab via reduced fetching by @gabe-lyons in #5632
- feat(ui) Retrieve last ingested timestamp and display on frontend by @chriscollins3456 in #5600
- Update README.md and maintaining consistency by @hemanthkotaprolu in #5623
- fix(ingest): fix delta-lake dict iteration bug by @hsheth2 in #5625
- fix(ingest): okta - make async loop init more robust by @shirshanka in #5640
- fix(ingest): cli - handle exception in upgrade check by @shirshanka in #5641
- build(docs-website): make codegen script idempotent by @hsheth2 in #5620
- docs(airflow): fix formatting by @hsheth2 in #5617
- fix(ui): Fixing minor search redirect filtering issue introduced by sticky filters by @jjoyce0510 in #5643
- fix(ingestion): Update developer docs by @szalai1 in #5644
- feat(ui): Adding slack handle to corp group info by @jjoyce0510 in #5645
- fix(delta-table): allow env, credential file based s3 auth by @MugdhaHardikar-GSLab in #5636
- feat(GraphQL API): Add "browsePaths" field to browsable entity types by @jjoyce0510 in #5646
- feat(ingest): generate a list of aspects in codegen by @hsheth2 in #5633
- feat(ingestion): Glue stateful ingestion by @amanda-her in #5553
- feat(ingest): add snowflake-beta source by @mayurinehate in #5517
- fix(ingest): remove alphabet field from allow/deny config by @hsheth2 in #5629
- feat(mssql): add multi database ingest support by @MugdhaHardikar-GSLab in #5516
- chore(ingest): drop data-lake source in favor of s3 source by @hsheth2 in #5628
- fix(ingest): use mongodb ping command to test connection by @hsheth2 in #5650
- fix(ingest): remove
profile_sql_table
event by @hsheth2 in #5616 - fix(ci): use graphql instead of restli by @anshbansal in #5610
- feat(ingest): rest_emitter - Adding option to disable ssl by @szalai1 in #5642
- feat(ingest): GE Profile/Action Trino support by @aezomz in #5361
- Stats Tab: Table and column stats hide when there is no data by @Ankit-Keshari-Vituity in #5651
- fix(ingest): redash - fix redash dashboard url bug by @de-kwanyoung-son in #5500
- Glossary: Worked on the refetching data issue by @Ankit-Keshari-Vituity in #5638
- feat(ingestion) Fetch live logs on an ingestion run from UI by @chriscollins3456 in #5653
- fix(spark-lineage): Create application setup on sqlevent start by @MugdhaHardikar-GSLab in #5657
- fix(ui) Remove constraint for searching with less than 3 characters by @chriscollins3456 in #5654
- docs: adds ABLY as DataHub adopter by @de-...
DataHub v0.8.43
v0.8.43
Highlights
User Experience
- Bulk edit support - you can now add or remove Owners, Glossary Terms, Tags, Domains, Deprecation Status to multiple entities with a few clicks!
- Improved user experience to create secrets and ingestion schedules
Developer/Community Experience
- A new Java-based file emitter, generating a JSON file that can be used in the “File” metadata ingestion source
- Delta Lake fixes to make it more stable and to extract table history to populate the operation aspect
Metadata Ingestion
- When ingesting metadata from the DataHub UI, you will now see an “Ingestion Run Summary” which shows the run outcome, number of entities successfully ingested, and the ability to download logs collected during the run
- New Dataset Domain Transformer - assign a Domain to Datasets during ingestion
Full Commit Log
What's Changed
- #5577 @jjoyce0510 feat(ui): Add rich UI ingestion run summary
- #5330 @liyuhui666 feat(ingest): clickhouse - add metadata modification time and data size
- #5582 @jjoyce0510 feat(ui): Support batch deleting from ui
- #5531 @Jiafi Fix profiling when using {table}.
- #5548 @Jiafi Expose catalog_name in athena.py
- #5335 @mohdsiddique feat(ingest): power-bi - make ownership ingestion optional
- #5587 @aditya-radhakrishnan fix(groups): fix user, search, and preview group membership to be fetched for both external and native group memberships
- #5586 @xiphl feat(ui): make container description searchable and have description show up in results
- #5585 @mohdsiddique fix apache ranger plugin readme file rendering
- #5277 @MugdhaHardikar-GSLab feat(ingest): delta-lake - extract table history into operation aspect
- #5584 @shirshanka fix(ingest): moving delta-lake connector to be 3.7+ only
- #5526 @MugdhaHardikar-GSLab fix(ingest): sql-common - db2, snowflake bug fixes to extract table descriptions
- #5566 @hsheth2 feat(ingest): infer aspectName from aspect type in MCP
- #5578 @MugdhaHardikar-GSLab feat(datahub-client): add java file emitter
- #5328 @Santhin feat(ingest): dbt - control over emitting test_results, test_definitions, etc.
- #5547 @hsheth2 fix(ingest): handle when current server version is unavailable
- #5579 @chriscollins3456 feat(ingestion) Add Save & Run button to managed ingestion builder
- #5558 @anshbansal feat(test): add read-only smoke tests
- #5581 @maggiehays chore(gradle): update node version for docs site
- #5580 @jjoyce0510 fix(ui): Fixing batch set domains bug
- #5574 @chriscollins3456 feat(ingestion) Implement secrets in new managed ingestion form
- #4976 @noahfournier feat(graphql): add MutableTypeBatchResolver
- #5572 @jjoyce0510 feat(ui): Support batch deprecation from the UI (Batch actions part 6/7)
- #5575 @gabe-lyons extending assertion std model
- #5560 @jjoyce0510 feat(ui): Batch set & unset Domain for assets via the UI
- #5571 @anshbansal chore(build): tweak stale issue timing
- #5570 @anshbansal fix(gms): missing directory for gms
- #5569 @anshbansal fix(ci): flaky smoke test fix
- #5568 @anshbansal fix(gms): ensure directory is present
- #5562 @gabe-lyons (chore): upgrading ingestion to 0.8.42
- #5551 @hsheth2 fix(ingest): activate mypy support for ParamSpec typing annotation
- #5563 @gabe-lyons chore(0.8.42): update breaking changes doc
- #5456 @mohdsiddique feat(transformers): Add domain transformer for dataset
- #5564 @hsheth2 fix(ingest): fix some typos and logging issues
- #5444 @xiphl feat(ingest) Allow ingestion of Elasticsearch index template
- #5541 @ms32035 fix(ingestion): correct trino datatype handling
- #5559 @chriscollins3456 feat(ingestion) Update managed ingestion scheduler to be easier to use
- #5552 @jjoyce0510 feat(ui): Batch add & remove Owners to assets via the UI
v0.8.42
v0.8.42
Highlights
User Experience
- Improved Search Experience - preview cards now display usage and freshness information
- Update to Schema History - incorporated Community feedback to remove “Blame” terminology
- Improved UI-Based Ingestion - easily configure metadata ingestion from Snowflake, BigQuery, Looker, and Tableau with an easy-to-follow form; YAML is still supported!
Developer/Community Experience
- Python 3.6 is no longer supported for ingestion – we expect this to impact fewer than 1% of DataHub users (based on PyPi download stats). Please upgrade to Python 3.7 or newer
- Update to GitHub Issue management - issues will be marked as “Inactive” after 30 days of no activity and will be automatically closed following an additional 30 days of inactivity
- We’ve updated our Slack Guidelines! Read them here
Metadata Ingestion
- You can now test your Snowflake connection via the CLI and UI-based Ingestion to ensure you have proper access levels required for general ingestion, profiling, and usage. We will be expanding this functionality to other cloud-based ingestion sources in upcoming cycles.
- Hard delete will now discover and remove soft deleted entities
- Resolved issue of assertion error with dbt stateful ingestion
Full Commit Log
What's Changed
- feat(quickstart,docs): updates for v0.8.41 by @anshbansal in #5409
- fix(ingest): ensure upgrade checks run async by @shirshanka in #5383
- fix(ingest): pass transport options to usage history looker api calls by @mayurinehate in #5417
- feat(quickstart): moving to official confluent images for m1 by @shirshanka in #5416
- fix(documentation) Fix erratic cursor in documentation editor bug by @chriscollins3456 in #5411
- feat(ui): Supporting enriched search preview + misc improvements by @jjoyce0510 in #5419
- chore: remove unnecessary modules from codebase by @shirshanka in #5420
- fix(ingest): extract usage for dashboards allowed by pattern by @mayurinehate in #5424
- fix(docker): fix kafka-setup command to support same capabilities as … by @shirshanka in #5428
- fix(protobuf): ownership fixes by @leifker in #5425
- fix(ui): add dataset qualifiedName parameter to lineage query by @alexey-kravtsov in #5427
- fix(glossary) Fix dropdown where disabled buttons are still clickable by @chriscollins3456 in #5430
- docs(bigquery): add changelog and unittest for profiling limits by @MugdhaHardikar-GSLab in #5407
- fix(siblings): fixing lineage fetching for siblings & sources by @gabe-lyons in #5415
- fix(ui): Fixing unreleased search preview bugs by @jjoyce0510 in #5432
- feat(ui): Adding Statistics Summary to Dataset + Dashboard Profiles by @jjoyce0510 in #5440
- feat(ingest): add test source connection feature, structured report file by @shirshanka in #5442
- fix(ingest/glue): handle error when generating s3 tags for virtual view tables by @timcosta in #5398
- feat(ingest): model - adding a small extension to support communicati… by @shirshanka in #5429
- fix(bigquery-usage): fix dataset name for sharded table by @MugdhaHardikar-GSLab in #5412
- feat(ingestion) Add new endpoint to test an ingestion connection by @chriscollins3456 in #5438
- feat(cli,build): remove deprecated variables GMS_HOST/_PORT by @anshbansal in #5451
- fix(search): make filters by default an empty list if null by @aditya-radhakrishnan in #5454
- fix(hive): add column comment as a column description by @MugdhaHardikar-GSLab in #5449
- feat(groups): add native groups concept to DataHub by @aditya-radhakrishnan in #5443
- fix(ingest): fix serialization of report to handle nesting by @shirshanka in #5455
- fix(tableau): fix tableau db error, add more logs by @mayurinehate in #5423
- build(deps): bump terser from 5.9.0 to 5.14.2 in /docs-website by @dependabot in #5448
- feat(doc): spark-lineage - Adding spark lineage configuration doc for Amazon EMR by @treff7es in #5459
- feat(schema-history): remove blame language for the schema history feature by @aditya-radhakrishnan in #5457
- Search header: Menu icon alignment by @Ankit-Keshari-Vituity in #5458
- build(deps): bump terser from 4.8.0 to 4.8.1 in /datahub-web-react by @dependabot in #5446
- feat(ingest): snowflake - basic test connection capability by @shirshanka in #5464
- fix(ingest/trino): Avoid exception if $properties table empty or not readable by @glinmac in #5447
- feat(ingest): preflight - Add way to check/upgrade brew package version in preflight if needed by @treff7es in #5435
- fix(build): add base image with gradle wrapper cached by @anshbansal in #5467
- doc(bigquery): groups grants by requirements by @sgomezvillamor in #5468
- fix(docs,build): remove base image not needed, cleanup docs by @anshbansal in #5469
- feat(ui): Partial support for Chart usage by @jjoyce0510 in #5473
- fix(ingest): bigquery: multiproject profiling fix by @treff7es in #5474
- fix(ingest): kafka - revert deps back to < 1.9.0 by @shirshanka in #5476
- feat(docker): support multiplatform image for datahub-upgrade by @shirshanka in #5477
- feat(quickstart): experimental support for backup restore for quickstart by @shirshanka in #5418
- feat(dbt): updating source lineage logic by @gabe-lyons in #5414
- Ingestion: Added form in Big Query type to edit the queries. by @Ankit-Keshari-Vituity in #5431
- docs: fix docsearch config by @hsheth2 in #5479
- Search Results: Added checkbox option to select multiple results at once. by @Ankit-Keshari-Vituity in #5422
- feat(delete): hard delete deletes soft deleted entities by @anshbansal in #5478
- fix(docs): add missing closing marker for note section by @shirshanka in #5480
- fix(build): intermittent failure in github actions by @anshbansal in #5452
- feat(model, ingest): add user email in dashboard user usage counts by @mayurinehate in #5471
- feat(ingest): add support for capability report in snowflake test connection by @mayurinehate in #5472
- feat(build): automatically mark issues as stale to close inactive issues by @anshbansal in #5482
- fix(ingest): loosen confluent-kafka dep requirement by @hsheth2 in #5489
- refactor(ingest): cleanup importlib.import_module calls by @hsheth2 in #5490
- build(ingest): make gradle build less chatty by @hsheth2 in #5491
- fix(ingest): Fixing dbt trino datatypes by @aezomz in #5379
- refactor(ci): use custom action for checking codegen status by @hsheth2 in #5493
- feat(spark-lineage): Support ssl cert disable functionality by @MugdhaHardikar-GSLab in #5488
- docs(auth): fix link to point to new doc by @anshbansal in #5501
- docs(updating-datahub): add note for breaking change in looker usage … by @mayurinehate in #5499
- fix(ingest): cleanup unused flake8 noqa statements by @hsheth2 in #5492
- refactor(ci): refactor Docker build-and-push workflows by @hsheth2 in #5494
- docs(slack) Update to Slack guidelines by @maggiehays in #5504
- feat(cli): dele...
v0.8.41
Highlights
User Experience
- Performance improvements in the UI
- Improvements in CSV connector for easier ingestion - description, ownership, domain support added
- UI form for Snowflake Managed Ingestion so you don't have to make changes in YAML
- Viewing Siblings
Developer Experience
- Ability to stop quickstart instead of nuking
- Customizing mapped ports in quickstart
- New models for dashboard usage
- Circuit breaker and python api for Assertion and Operation
Metadata Ingestion
- Improvements in bigquery connector to only profile some tables
- Intermittent 401 errors during ingestion fixed
- New salesforce connector
What's Changed
- fix(test): add cleanup in tests, make urls configurable by @anshbansal in #5287
- fix(docs,quickstart): release related changes for 0.8.40 by @anshbansal in #5299
- [Deployment]: fix config typo on confluent cloud by @tengis in #5293
- fix(cli): suppress secrets in stacktraces by @anshbansal in #5302
- refactor(ui): Fix settings page divider by @jjoyce0510 in #5292
- fix(cli): timeline - category should be owner not ownership by @shirshanka in #5304
- perf(siblings): reduce data fetched by siblings in lineage by @gabe-lyons in #5308
- fix(ingest): bigquery - Fix for bigquery error when there was no bigquery catalog specified by @treff7es in #5303
- fix(ui) Fix entity profile sidebar width issues by @chriscollins3456 in #5305
- perf(search): Improve search default performance by @jjoyce0510 in #5311
- perf(ui): Performance improvements and misc refactorings in the UI by @jjoyce0510 in #5310
- Modified the drop down of Menu Items by @Ankit-Keshari-Vituity in #5301
- fix(validation) Fail validation error silently instead of crashing by @chriscollins3456 in #5314
- feat(docs) Add documentation on authorization & authentication by @pedro93 in #5265
- fix(ui) Make profile icon clickable to expand header menu by @chriscollins3456 in #5317
- refactor(ui): Extract searchable page into its own component (perf + ux) by @jjoyce0510 in #5318
- fix(gms) Remove auto-creating status aspect if not present when ingesting by @pedro93 in #5315
- fix(ui): Add missing SearchRoutes component by @jjoyce0510 in #5321
- feat(ingest): Ingest Looker dashboard create/update/delete timestamps by @mayurinehate in #5312
- fix(ui): Fix pipeline tasks list loading by @jjoyce0510 in #5332
- feat(ingest): lookml - adding support for only emitting reachable vie… by @shirshanka in #5333
- fix(ingest): omit schema fields when name is absent by @mayurinehate in #5275
- fix(siblings) Combine siblings data but remove duplicate data by @chriscollins3456 in #5337
- Fix typo in metadata-ingestion.md by @dougpm in #5338
- fix(me) Cache the me query for performance reasons by @chriscollins3456 in #5316
- fix(tokens) Adds non-admin tests for access tokens by @pedro93 in #5174
- feat(bigquery): support size, rowcount, lastmodified based table selection for profiling by @MugdhaHardikar-GSLab in #5329
- chore: Refactor Python Codebase by @koconder in #5113
- docs(bigquery): profiling report enhancement by @MugdhaHardikar-GSLab in #5342
- feat(ingest): update CSV source to support description and ownership type by @aditya-radhakrishnan in #5346
- Fixed UI issue: Tags list going outside the container by @Ankit-Keshari-Vituity in #5341
- feat(ingest): add salesforce connector by @mayurinehate in #5104
- feat(bootstrap): create abstract class UpgradeStep to abstract away upgrade logic by @aditya-radhakrishnan in #5349
- fix(bigquery-usage): dataset name fix for sharded tables by @MugdhaHardikar-GSLab in #5347
- docs(features): update grammar on Features overview by @maggiehays in #5350
- fix(ci): fix mysql and kafka-connect ingestion test by @shirshanka in #5352
- feat(ui): add copy function for stats table sample value by @ngamanda in #5331
- fix(ui) Correct show/hide tabs in Settings based on privileges by @chriscollins3456 in #5355
- fix(siblings): add useMutationUrn to domain section by @gabe-lyons in #5270
- feat(schema) Show last observed timestamp in the schema tab by @chriscollins3456 in #5348
- fix(glossary) Fixes a bug for yaml ingested terms without source_url by @chriscollins3456 in #5356
- feat(lineage) Add Lineage tab to Chart and Dashboard entity profiles by @chriscollins3456 in #5357
- fix(cassandra): fix Cassandra queries used by IngestDataPlatformInstancesStep by @justinas-marozas in #5199
- refactor(ui): Use createTag mutation for creating new tags from the UI by @jjoyce0510 in #5359
- Added recommendation on group modal by @Ankit-Keshari-Vituity in #5362
- refactor(ui): Remove unnecessary fields in GraphQL queries by @jjoyce0510 in #5358
- feat(ingest) - add audit actor urn to auditStamp by @neojunjie in #5264
- feat(ingest): Domain ingestion usability by @shirshanka in #5366
- fix(config): fixes config key in DataHubAuthorizerFactory by @sgomezvillamor in #5371
- fix(ingest): domains - check whether urn based domain exists during r… by @shirshanka in #5373
- feat(quickstart): Adding env variables and cli options for customizing mapped ports in quickstart by @NavinSharma13 in #5353
- fix(build): tweak ingestion build by @anshbansal in #5374
- feat(query) Add get_entity_v2 to python package by @aezomz in #5255
- fix(airflow): Fix for failing serialisation when Param was specified + support for external task sensor by @treff7es in #5368
- fix(users): fix to not get invite token unless the invite token modal is visible by @aditya-radhakrishnan in #5380
- fix(gms): Propagate token cache error by @pedro93 in #5381
- fix(bootstrap): skip ingesting data platforms that already exist by @aditya-radhakrishnan in #5382
- fix(cli): respect server telemetry settings correctly by @treff7es in #5384
- fix(ingest): bigquery - Graceful bq partition id date parsing failure by @treff7es in #5386
- feat(airflow): Circuit breaker and python api for Assertion and Operation by @treff7es in #5196
- feat(kafka-setup): add options for sasl_plaintext by @abiwill in #5385
- fix(bigquery): multi-project GCP setup run query through correct project by @anshbansal in #5393
- fix(bigquery): add storage project name by @anshbansal in #5395
- Add Changes to support smoke test on Datahub deployed on kubernetes Cluster by @NavinSharma13 in #5334
- fix(PlayCookie) PLAY_TOKEN cookie rejected because userprofile exceeds 4096 chars by @neojunjie in #5114
- feat(dashboards): add datasets field to DashboardInfo aspect by @Masterchen09 in #5188
- feat(siblings): allow viewing siblings separately by @gabe-lyons in #5390
- Added Cursor pointer to tags by @Ankit-Keshari-Vituity in #5389
- feat(GMS): Adding Dashboard Usage Models by @jjoyce0510 in #5399
- fix(q...
v0.8.40
Highlights
Fixes bug in 0.8.39 that prevented standalone MAE consumers from being deployed.
User Experience
Support for deleting Tags and Domains via the UI
Support for editing Domain name via the UI
Visualize Glossary Term source on the Glossary Term Entity Page
Developer Experience
Fix for issue where standalone MAE consumers could not be deployed
Metadata Ingestion
Script to re-index sibling associations for dbt nodes that had already been ingested before 0.8.39
What's Changed
- feat(search) Allow users to update the number of search results per page by @chriscollins3456 in #5212
- feat(build): add base image for ingest by @anshbansal in #5243
- feat(ingest): working with multiple bigquery projects by @anshbansal in #5240
- fix(build): missing libs by @anshbansal in #5254
- fix(build): use correct creds by @anshbansal in #5261
- feat(ingest): redshift - Option to define path spec for Redshift lineage generation by @treff7es in #5256
- fix(ui): Enable previews properly when browsing for DataJob by @MikeSchlosser16 in #5250
- fix(docs): Fix acronym on mxe docs by @MikeSchlosser16 in #5249
- fix(ui): Support deleting references to glossary terms / nodes, users, assertions, and groups by @jjoyce0510 in #5248
- feat(docs) add links in quickstart for adding users by @pedro93 in #5267
- fix(siblings) Display sibling assertions in Validations tab by @chriscollins3456 in #5268
- Feat(domain) Add ability to edit a Domain name from the UI by @chriscollins3456 in #5266
- Delta lake base by @MugdhaHardikar-GSLab in #5259
- fix(siblings) Update the names of siblings utils args for readability by @chriscollins3456 in #5269
- docs(adopters): add showroomprive and n26 as DataHub adopters by @maggiehays in #5271
- feat(glossary) Add Source section to sidebar for Glossary Terms by @chriscollins3456 in #5262
- fix(delta-lake): fix dependency issue for snowflake due to s3_util by @MugdhaHardikar-GSLab in #5274
- fix(ingest): s3 - Remove unneeded methods from s3_util by @MugdhaHardikar-GSLab in #5276
- Selector recommendations in Owner, Tag and Domain Modal by @Ankit-Keshari-Vituity in #5197
- fix(security) Sanitize rich text before sending to backend or rendering on frontend by @chriscollins3456 in #5278
- feat(GraphQL): Support for Deleting Domains, Tags via GraphQL API by @jjoyce0510 in #5272
- feat(build): reduce build time for ingestion image by @anshbansal in #5225
- fix(ingestion): profiling - Fixing partitioned table profiling in BQ by @treff7es in #5283
- fix(ingest) redshift: Adding missing dependencies and relaxing sqlalchemy dependency by @treff7es in #5284
- fix(ingestion): Reverting sqlalchemy upgrade because it caused issues with mssql and redshift-usage by @treff7es in #5289
- fix(Siblings): Have sibling hook use entity client by @gabe-lyons in #5279
- Show message when related glossary terms are empty. by @Ankit-Keshari-Vituity in #5285
- docs(adopter): add Digital Turbine as DataHub adopter by @maggiehays in #5290
- Update schema-registry docker.env by @liyuhui666 in #5231
- feat(siblings): index sibling aspects for historical dbt metadata by @gabe-lyons in #5291
- feat(ui) Adding support for deleting Tags and Domains via the UI by @jjoyce0510 in #5280
Full Changelog: v0.8.39...v0.8.40
v0.8.39
Release Highlights
Known Issues
When using stand-alone MAE consumers (mae-consumer-job) this release will not work; this has been resolved in v0.8.40.
User Experience
- NEW: support for surfacing outcomes of dbt Tests in dataset entity pages (see it in action here)
- NEW: Improved navigation of dbt resources: dbt models and their associated warehouse tables are now merged into a unified entity (see it here). This will automatically be enabled for all newly ingested entities. To view this for entities you have already ingested, you will need to run a restore indices job.
- Improvement to Impact Analysis: When looking at the
Lineage
tab, you can now easily toggle between “Upstream” and “Downstream” entities (try it out here)
Developer Experience
- NEW: Java Kafka Emitter – Use this when you want to decouple your metadata producer from the uptime of your datahub metadata server by utilizing Kafka as a highly available message bus
Metadata Ingestion
- NEW: Make bulk edits to your metadata via CSV (read more)
- Snowflake ingestion improvements: configure profiling to run only if they have been updated within the prior N days
- Managed ingestion update: removed need for sink block
What's Changed
- fix(ui-ingestion): update looker ingestion warning banner by @aditya-radhakrishnan in #5142
- chore: Bump Default UI Ingestion Version 0.8.38 by @jjoyce0510 in #5145
- feat(schema): support rendering schemas with
.
in field names by @gabe-lyons in #5141 - feat(dbt): Platform instances for target platform by @skrydal in #5129
- feat(ingest): snowflake profile tables only if they have been updates… by @mayurinehate in #5132
- fix(airflow): fixes DeprecationWarning with hook-class-names by @sayakmaity in #5143
- feat(frontend): Parse JWT access token claims by @chen4119 in #5138
- fix(tokens): Using keyword search filters for ListAccessTokensResolver by @jjoyce0510 in #5154
- feat(ui) Update the max text length of Terms/Term Groups by @chriscollins3456 in #5162
- docs(policies): add info about Manage User Credentials by @aditya-radhakrishnan in #5157
- fix(restore-indices): Do not fail on MAE row count diff by @dexter-mh-lee in #5165
- fix(Kafka-setup): Make sure it doesn't fail when the new envs are not set by @dexter-mh-lee in #5168
- chore(deps): Bump Nimbus Jose JWT dependency by @pedro93 in #5158
- fix(recs): Verify that an entity exists before recommending by @jjoyce0510 in #5163
- fix(business glossary): setting properties to be empty if the node has no properties aspect by @gabe-lyons in #5166
- refactor(ui): Misc improvements to Dataset Assertions UI by @jjoyce0510 in #5155
- chore(guava): force version of guava in client jars per #5134 by @RyanHolstien in #5153
- feat(boot): Make Glossary Term Upgrade Async by @jjoyce0510 in #5164
- fix(frontend): Add iam auth jar to frontend by @dexter-mh-lee in #5171
- docs(features): update & clean up Features page by @maggiehays in #5175
- fix(glue): fix glue profiling config option by @kangseonghyun in #5178
- feat(upgrade) Check version when determining to run RestoreGlossaryIndices step by @chriscollins3456 in #5182
- fix(jaas): fixed auth.jaas.enabled option parsing by @alexey-kravtsov in #5179
- feat(ingestion): bigquery - Option to send usage queries as well as Operational metadata by @treff7es in #5151
- feat(build): changes to decrease build time, cancel runs in case of multiple commits by @anshbansal in #5187
- refactor(docs): Update Metadata Events Docs by @jjoyce0510 in #5173
- fix(ingest): If there is no manager for a LDAP user (example: system account) by @bda618 in #5180
- bug(ingest): correct case of sys views for mssql description populati… by @BALyons in #5186
- refactor(configs): Simplify Kafka Topic name configurations + docs by @jjoyce0510 in #5198
- feat(ingest): dbt - adding support for dbt tests by @shirshanka in #5201
- fix(cli): correct handling of env variables by @anshbansal in #5203
- feat(ci): split integration tests to reduce run time by @anshbansal in #5205
- feat(datahub-client): add java kafka emitter by @MugdhaHardikar-GSLab in #5074
- feat(graphql): add metrics capturing for graphql latency by @RyanHolstien in #5200
- test(ingestion): bigquery-usage - Adding tests for bigquery usage filters by @treff7es in #5195
- fix(ui): load monaco-editor as a dependency and not from a third party CDN by @Masterchen09 in #5189
- feat(cli): Add token parameter for sample ingestion by @pedro93 in #5160
- feat(lineage) Update Lineage tab and Impact Analysis feature by @chriscollins3456 in #5121
- fix(ingest): add missing ownership types by @afghori in #5209
- feat(ingestion) ldap: make ldap attrs keys configurable by @atulsaurav in #4682
- Remove unnecessary space from application.yml of GMS by @mmmeeedddsss in #5216
- fix(upgrade): fix upgrade when s3 path has = by @RyanHolstien in #5220
- feat(docs) Add and update docs for the new Glossary experience by @chriscollins3456 in #5211
- feat(glossary) Add empty state for the Business Glossary home page by @chriscollins3456 in #5217
- feat(bootstrap): add bootstrap step to clear out unknown aspect rows from the database by @RyanHolstien in #5148
- feat(ingest): adds csv enricher ingestion source by @aditya-radhakrishnan in #5221
- fix(build): pin confluent kafka dependency by @anshbansal in #5224
- fix(ingest): databricks - ingest structs correctly through hive by @shirshanka in #5223
- feat(dbt): add sibling association logic to associate dbt elements with their target systems by @gabe-lyons in #5190
- feat(tableau): use pagination for all connection queries by @mayurinehate in #5204
- Handling 404 page not found by @Ankit-Keshari-Vituity in #5227
- refactor(UI): Refactor Dataset Health Status by @jjoyce0510 in #5222
- fix(dbt-test): Inconsistency in assertions by @Santhin in #5214
- feat(ingest): remove need for sink block in UI based ingestion by @anshbansal in #5208
- fix(ingest): bigquery - Grouping date named tables at bigquery by @treff7es in #5230
- Add check for 0 rows when profiling datasets from s3 by @Jiafi in #5219
- [bug fix]: disabled create buttons by @xiphl in #5234
- fix(ingest): bigquery - Handling gracefully sql parser error in bq lineage by @treff7es in #5238
- fix(ingest): do not dump password by @anshbansal in #5235
- feat(ingest): dbt - improving dbt_meta mapping by @shirshanka in https://github.com/datahub-project/data...
[!] DataHub v0.8.38
Notice: There is a known issue in this release. Listing access tokens for a user may not return the correct results to the UI due to an unreliable query to DataHub's search backend. This will be resolved in v0.8.39. Note that this does not mean that access tokens will not work or are in any way compromised - the functionality of generating and using access tokens is not impacted.
The below release notes are copied from v0.8.37 release notes.
Highlights
User Experience
This release comes packed full of new features and updates.
- NEW – Create & Revoke Access Tokens via the UI - Find this under Settings > Developer. This replaces the previous stateless tokens UI.
- NEW – Create and Invite Users to DataHub via the UI - Find this under Users & Groups > Invite DataHub users. Admins can also now generate password reset links for their users.
- NEW - Manage Related Glossary Terms via the UI - Add and remove Glossary Terms Contained By and Inherited From a parent via the UI. Find this under Glossary
- UPDATE - Rename “Manage” navigation item to “Govern”
- [IMPORTANT] UPDATE - Move “Users & Groups” navigation item into Settings > Access
- [IMPORTANT] UPDATE - Move “Policies” navigation item into Settings > Access (Privileges)
- FIX - You no longer need to run a reindexing job to start using the new Business Glossary UI. This process is handled for you at boot time.
- Minor fixes & improvements to UI for adding policy users + groups.
Metadata Ingestion
- Support Snowflake ingest via Oauth
- Misc fixes and improvements to existing ingestion sources
Disclaimers:
With this upgrade, we've added a new mechanism for authenticating users: native authentication. By default, this is enabled, which will allow new users to be created by Admin and for the user to login.
If you were previously disabling BOTH JaaS (via AUTH_JAAS_ENABLED = false) AND OIDC, and you still do not want to require a username + password to login, you'll need to add a new environment variable to datahub-frontend-react
container: AUTH_NATIVE_ENABLED=false.
What's Changed
- feat(docs): auto-open config section for ingestion sources by @shirshanka in #5075
- feat(spark-lineage): coalesce spark jobs by @MugdhaHardikar-GSLab in #5077
- refactor(ui): UI Navigation Refactoring by @jjoyce0510 in #5076
- Update docs to alert users to restore indices for their Glossary by @chriscollins3456 in #5082
- fix(restore-indices): Do not fail while working with each row by @dexter-mh-lee in #5084
- fix(ingestion): looker - Handling gracefully invalid json in query dynamic field by @treff7es in #5083
- feat(docs): ingest - add tab for config json schema by @shirshanka in #5086
- chore(dep): upgrade json-smart by @RyanHolstien in #5081
- feat(ingest): rest_emitter - Adding option to rest emitter to disable ssl verification by @treff7es in #5042
- feat(cli): suggest upgrades when appropriate by @shirshanka in #5091
- feat(doc): Generating json schema for ingestion recipes by @treff7es in #5092
- feat(ingest): snowflake using oauth by @saxo-lalrishav in #4647
- fix(ui): do not show copy URN buttons when Clipboard API is not available by @Masterchen09 in #5087
- feat(kafka): use a thread pool executor for kafka for thread reuse by @RyanHolstien in #5079
- Manage Access Tokens by @Ankit-Keshari-Vituity in #5067
- tests(lookml): adding tests for model deny patterns by @gabe-lyons in #4934
- feat(model): Add optional context field to tag/term association by @dexter-mh-lee in #5085
- fix(glossary) Two quick followup fixes around the new Glossary updates by @chriscollins3456 in #5065
- chore(deps): bump eventsource from 1.1.0 to 1.1.1 in /docs-website by @dependabot in #5057
- feat(oidc): add configurable read timeout by @RyanHolstien in #5088
- feat(glossary) Display Incoming 'IsA' Glossary related entities by @chriscollins3456 in #5063
- fix(profiling): don't stop if some steps fail by @anshbansal in #5095
- feat(upgrades) Create new DataHubUpgrade + Restore Glossary Entities Bootstrap step by @chriscollins3456 in #5099
- fix(deps): ingest - moving packaging to framework_common by @shirshanka in #5096
- feat(frontend) Allow overriding akka-max-header-value-length by @karoliskascenas in #5094
- refactor(graphql): Migrate Visual Config into the Configuration Provider by @jjoyce0510 in #4780
- chore(akka): upgrade akka http for vuln by @RyanHolstien in #5100
- fix(build): reduce time taken for resolution by @anshbansal in #5106
- fix(build): remove dependencies added for compatibility by @anshbansal in #5108
- fix(ci): pin google-cloud-logging to avoid pip backtracking by @shirshanka in #5109
- Policies page issue by @Ankit-Keshari-Vituity in #5107
- chore(deps): Bump spring to 5.3.20 for vuln fix by @pedro93 in #5110
- fix(cli): Bumping avro-gen3 to 0.7.4 by @jjoyce0510 in #5098
- feat(docs): Updating example files with the new ingestion recipe suffix by @treff7es in #5103
- feat(graphql): add graphql endpoint to check whether an entity exists by @aditya-radhakrishnan in #5102
- feat(looker): ensure explore name matches looker's display name by @shirshanka in #5111
- fix(ui): Fixing missing homescreen logo by @jjoyce0510 in #5112
- fix(dbt): final fix of dbt platform instance issues by @gabe-lyons in #5115
- feat(ingestion): bigquery-usage - Collect stats from read event reasons by @treff7es in #5118
- feat(terms) Add ability to Add and Remove Related Terms to Glossary Terms by @chriscollins3456 in #5120
- Fixed Issue : Add Members Modal by @Ankit-Keshari-Vituity in #5117
- fix(bigquery): handling of empty partitioned tables, improve report message by @anshbansal in #5122
- feat(glossary) Hide self and children from select when moving a GlossaryNode by @chriscollins3456 in #5123
- fix(ingestion): bigquery-usage - Removing filtering at queryevents by @treff7es in #5124
- feat(users): add ability to add native users from the UI by @aditya-radhakrishnan in #5097
- fix(ingestion): Looker original view name should be used for explore_joins by @sebkim in #4928
- fix(iceberg): Change how MapType are mapped to Avro to support complex Map key type by @cccs-eric in #5060
- fix(ingestion): bigquery-usage - Only send operational metadata for allowed tables by @treff7es in #5127
- fix(dbt): Validator error fix by @BoyuanZhangDE in #5125
- feat(settings): skip calling graphql hooks if user does not have the right permissions by @aditya-radhakrishnan in #5136
- fix(ingest): fix table urn for athena connectionType by @mayurinehate in #5135
- Fixed the UI issue on Deprecated Pop-Up issue by @Ankit-Keshari-Vituity in #5130
- fix(ui-ingestion): show warning banner when configuring looker ui-ingestion for the first time by @aditya-radhakrishnan in #5139
- fix(tokens): Fix stale cache problem, reduce cache timeout for access tokens + fix listing owner tokens by @jjoyce0510 in #5140
Full Changelog: v0.8.37...v0.8.38