Skip to content

Conversation

@david-leifker
Copy link
Collaborator

@david-leifker david-leifker commented Oct 9, 2025

Introduce a new unified entity index which consolidates entity indices (100+) into a few indices. It will eventually also replace the system metadata index. This is disabled by default and only contains code for the write path at this time.

Major Features

  • Unified Search Indexes: Partial implementation supporting v3 entity index management, settings/mappings, and dual writes (current v2 & v3)
  • Search Index Groups: New entity registry configuration for managing multiple search index entity groups

Key Components Added

  • MultiEntityMappingsBuilder - Unified mapping generation
  • UpdateIndicesV2Strategy & UpdateIndicesV3Strategy - Index update strategies
  • SearchableAnnotationValidator - Validation for search annotations
  • DelegatingMappingsBuilder & DelegatingSettingsBuilder - Delegation patterns

Configuration Changes

  • New entity registry YAML configuration for search index groups

@github-actions github-actions bot added docs Issues and Improvements to docs product PR or Issue related to the DataHub UI/UX devops PR or Issue related to DataHub backend & deployment labels Oct 9, 2025
@datahub-cyborg datahub-cyborg bot added the needs-review Label for PRs that need review from a maintainer. label Oct 9, 2025
@alwaysmeticulous
Copy link

alwaysmeticulous bot commented Oct 9, 2025

✅ Meticulous spotted 0 visual differences across 993 screens tested: view results.

Meticulous evaluated ~8 hours of user flows against your PR.

Expected differences? Click here. Last updated for commit d937d9e. This comment will update as new commits are pushed.

@codecov
Copy link

codecov bot commented Oct 9, 2025

Bundle Report

Bundle size has no change ✅

# Conflicts:
#	datahub-upgrade/src/main/java/com/linkedin/datahub/upgrade/loadindices/LoadIndices.java
#	datahub-upgrade/src/main/java/com/linkedin/datahub/upgrade/loadindices/LoadIndicesArgs.java
#	datahub-upgrade/src/main/java/com/linkedin/datahub/upgrade/loadindices/LoadIndicesIndexManager.java
#	datahub-upgrade/src/main/java/com/linkedin/datahub/upgrade/loadindices/LoadIndicesStep.java
#	datahub-upgrade/src/main/java/com/linkedin/datahub/upgrade/loadindices/config/LoadIndicesConfig.java
#	datahub-upgrade/src/test/java/com/linkedin/datahub/upgrade/UpgradeCliApplicationTestConfiguration.java
#	datahub-upgrade/src/test/java/com/linkedin/datahub/upgrade/loadindices/LoadIndicesIndexManagerTest.java
#	datahub-upgrade/src/test/java/com/linkedin/datahub/upgrade/loadindices/LoadIndicesStepTest.java
#	datahub-upgrade/src/test/java/com/linkedin/datahub/upgrade/loadindices/config/LoadIndicesConfigTest.java
#	docs/how/load-indices.md
#	metadata-io/src/main/java/com/linkedin/metadata/search/elasticsearch/indexbuilder/ESIndexBuilder.java
#	metadata-io/src/main/java/com/linkedin/metadata/search/elasticsearch/update/BulkListener.java
#	metadata-io/src/main/java/com/linkedin/metadata/search/elasticsearch/update/ESBulkProcessor.java
#	metadata-io/src/main/java/com/linkedin/metadata/search/elasticsearch/update/ESWriteDAO.java
#	metadata-io/src/main/java/com/linkedin/metadata/service/UpdateIndicesService.java
#	metadata-io/src/testFixtures/java/io/datahubproject/test/search/BulkProcessorTestUtils.java
#	metadata-service/factories/src/main/java/com/linkedin/gms/factory/common/RestHighLevelClientFactory.java
Supports:

v2/v3 index management
v3 settings/mappings
Dual v2/v3 writes
Not including:

query
fix misssing break
Copy link
Collaborator

@abedatahub abedatahub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving.

I flagged a couple of tests for over-testing. There must be others but I haven't looked.

I think #15073 has proved its usefulness (the guidelines are good but I haven't verified that claude code does the right thing when generating new unit tests).

@david-leifker david-leifker merged commit eeb2c88 into master Oct 29, 2025
80 checks passed
@david-leifker david-leifker deleted the unified-entity-index branch October 29, 2025 00:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

devops PR or Issue related to DataHub backend & deployment docs Issues and Improvements to docs pending-submitter-merge product PR or Issue related to the DataHub UI/UX publish-docker

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants