New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(entity-registry): entity registry plugins #9538
Conversation
Add support for various custom plugins: * validation * RecordTemplate mutators * MCL Side Effects * MCE Side Effects
2284591
to
1fc6766
Compare
entity-registry/src/main/java/com/linkedin/metadata/aspect/plugins/hooks/MutationHook.java
Show resolved
Hide resolved
entity-registry/src/test_plugins/mycompany-full-model/0.0.1/entity-registry.yaml
Outdated
Show resolved
Hide resolved
Needs docs updated |
updated documentation refactor to reduce generics use added support for config endpoint
mclSideEffect -> | ||
mclSideEffect.apply(List.of(batch), _entityRegistry, systemEntityClient)); | ||
|
||
for (MCLBatchItem mclBatchItem : Stream.concat(Stream.of(batch), sideEffects).toList()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Batching together MCLs is a significant change, have we considered all of the cases where parallel batches contain conflicting changes? I feel like it would be easy to get ourselves in a deadlock scenario from the Kafka side with this i.e. multiple consumers processing multiple batches with the same aspect being modified could repeatedly fail batches without making progress and having to reprocess the entire batch, with only one MCL processed (current state) this can still happen (and has), but one of the consumers is guaranteed to make progress whereas with a batch this is not the case. I'm not seeing rollback logic in here so it seems like the entire batch gets reprocessed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The batches contain default aspects today and they do conflict sometimes. This is handled by retry logic in the elasticsearch processor (and transaction retries in the ebean MCP/mce stage). We have deadlock recovery logic already. The batching in this PR is extending it to include those default aspects per normal as well as possible side effect aspects. If there is a deadlock and failure it would be re-tried either at the driver level (es client or ebean client) and then for mcl/mae consumer the message would be reprocessed.
Documentation talks about leveraging these features for custom aspects. What about system defined aspects? |
@ajoymajumdar - These plugins allow modification of custom or built-in aspects. It should be noted that certain changes to built-in aspects can of course render your system inoperable. |
Add support for various custom plugins:
Updated Documentation:
https://github.com/datahub-project/datahub/blob/bb05e3e2d60ac4652ce6c96f0b307292a507be6c/metadata-models-custom/README.md#custom-plugins
Added /config info for plugin loading debugging:
Checklist