-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(Dataquality aspect): Added Data Quality Metrics aspect to emit data quality metrics metadata into Datahub #9265
base: master
Are you sure you want to change the base?
Conversation
* tool_name attribute to capture the tool that has been used to generate the quality metrics. | ||
* Optional field | ||
*/ | ||
tool_name: optional string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we use camel case here as well?
dimensionName: enum DimensionName { | ||
ACCURACY | ||
COMPLETENESS | ||
CONSISTENCY | ||
INTEGRITY | ||
SEMANTIC_CORRECTNESS | ||
UNIQUENESS | ||
VALIDITY | ||
TIMELINESS | ||
RELIABILITY | ||
DUPLICATION | ||
PRECISION | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a bit worried of using a fixed enums list here for possible Dimensions. Similar approach has been used for the OwnershipTypes
and it turned out to not be meeting requirements of everyone and did not allow for easy extension via non-fork metadata model extension. Since then the entire OnwershipTypes
model has been changed to not use enums for types but rather handle dynamic list of OwnershipTypes (as entities on their own) and link to them using their urns from within the Ownership
aspect.
The overall commit introducing the dynamic OwnershipTypes
feature can be found here I think:
ea92b86
Please note though that the change is so profound it's an ongoing effort and multiple changes have been done over time to bring the project closer to the usage of new approach while maintaining backward compatibility.
Datahub populates the list of OwnershipTypes
during the bootstrap, this PR introduces possibility to define user's own list of ownership types to bootstrap (we found this approach the best to define semi-static list fitting our custom purposes).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. We, for example, define multiple types of validity ( "is this zip code valid" and "is this value valid according to our business rules").
PERCENTAGE | ||
NUMERICAL_VALUE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am by no means an expert on the subject so forgive my ignorance, but how do you map a numerical value to the "accuracy" and "completeness"? Or would a boolean type be a good addition here?
Checklist