Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(entity-registry): entity registry plugins #9538

Merged
merged 5 commits into from Jan 8, 2024

Conversation

david-leifker
Copy link
Collaborator

@david-leifker david-leifker commented Dec 30, 2023

Add support for various custom plugins:

  • validation
  • RecordTemplate mutators
  • MCL Side Effects
  • MCE Side Effects

Updated Documentation:
https://github.com/datahub-project/datahub/blob/bb05e3e2d60ac4652ce6c96f0b307292a507be6c/metadata-models-custom/README.md#custom-plugins

Added /config info for plugin loading debugging:

{
  "mycompany-dq-model": {
    "0.0.0-dev": {
      "plugins": {
        "validatorCount": 1,
        "mutationHookCount": 1,
        "mcpSideEffectCount": 1,
        "mclSideEffectCount": 1,
        "validatorClasses": [
          "com.linkedin.metadata.aspect.plugins.validation.CustomDataQualityRulesValidator"
        ],
        "mutationHookClasses": [
          "com.linkedin.metadata.aspect.plugins.hooks.CustomDataQualityRulesMutator"
        ],
        "mcpSideEffectClasses": [
          "com.linkedin.metadata.aspect.plugins.hooks.CustomDataQualityRulesMCPSideEffect"
        ],
        "mclSideEffectClasses": [
          "com.linkedin.metadata.aspect.plugins.hooks.CustomDataQualityRulesMCLSideEffect"
        ]
      }
    }
  }
}

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
  • For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

@github-actions github-actions bot added product PR or Issue related to the DataHub UI/UX devops PR or Issue related to DataHub backend & deployment labels Dec 30, 2023
Add support for various custom plugins:
* validation
* RecordTemplate mutators
* MCL Side Effects
* MCE Side Effects
@shirshanka
Copy link
Contributor

Needs docs updated

refactor to better support merging
started documentation in models-custom
updated documentation
refactor to reduce generics use
added support for config endpoint
@david-leifker david-leifker changed the title [WIP] feat(entity-registry): entity registry plugins feat(entity-registry): entity registry plugins Jan 7, 2024
mclSideEffect ->
mclSideEffect.apply(List.of(batch), _entityRegistry, systemEntityClient));

for (MCLBatchItem mclBatchItem : Stream.concat(Stream.of(batch), sideEffects).toList()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Batching together MCLs is a significant change, have we considered all of the cases where parallel batches contain conflicting changes? I feel like it would be easy to get ourselves in a deadlock scenario from the Kafka side with this i.e. multiple consumers processing multiple batches with the same aspect being modified could repeatedly fail batches without making progress and having to reprocess the entire batch, with only one MCL processed (current state) this can still happen (and has), but one of the consumers is guaranteed to make progress whereas with a batch this is not the case. I'm not seeing rollback logic in here so it seems like the entire batch gets reprocessed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The batches contain default aspects today and they do conflict sometimes. This is handled by retry logic in the elasticsearch processor (and transaction retries in the ebean MCP/mce stage). We have deadlock recovery logic already. The batching in this PR is extending it to include those default aspects per normal as well as possible side effect aspects. If there is a deadlock and failure it would be re-tried either at the driver level (es client or ebean client) and then for mcl/mae consumer the message would be reprocessed.

@ajoymajumdar
Copy link
Contributor

Documentation talks about leveraging these features for custom aspects. What about system defined aspects?

@david-leifker
Copy link
Collaborator Author

@ajoymajumdar - These plugins allow modification of custom or built-in aspects. It should be noted that certain changes to built-in aspects can of course render your system inoperable.

@david-leifker david-leifker merged commit 8415fc2 into master Jan 8, 2024
38 checks passed
@david-leifker david-leifker deleted the feat-entity-registry-plugins branch January 8, 2024 20:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
devops PR or Issue related to DataHub backend & deployment product PR or Issue related to the DataHub UI/UX
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants