Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(Dataquality aspect): Added Data Quality Metrics aspect to emit data quality metrics metadata into Datahub #9265

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

naresh-angala
Copy link

Checklist

  • [ X ] The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
  • For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

@naresh-angala naresh-angala changed the title feat(Dataquality aspect): Added Data Quality Metrics aspect to emit data quality metrics into Datahub feat(Dataquality aspect): Added Data Quality Metrics aspect to emit data quality metrics metadata into Datahub Nov 17, 2023
@david-leifker david-leifker added the product PR or Issue related to the DataHub UI/UX label Nov 21, 2023
* tool_name attribute to capture the tool that has been used to generate the quality metrics.
* Optional field
*/
tool_name: optional string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we use camel case here as well?

Comment on lines +11 to +23
dimensionName: enum DimensionName {
ACCURACY
COMPLETENESS
CONSISTENCY
INTEGRITY
SEMANTIC_CORRECTNESS
UNIQUENESS
VALIDITY
TIMELINESS
RELIABILITY
DUPLICATION
PRECISION
}
Copy link
Contributor

@skrydal skrydal Jan 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit worried of using a fixed enums list here for possible Dimensions. Similar approach has been used for the OwnershipTypes and it turned out to not be meeting requirements of everyone and did not allow for easy extension via non-fork metadata model extension. Since then the entire OnwershipTypes model has been changed to not use enums for types but rather handle dynamic list of OwnershipTypes (as entities on their own) and link to them using their urns from within the Ownership aspect.
The overall commit introducing the dynamic OwnershipTypes feature can be found here I think:
ea92b86
Please note though that the change is so profound it's an ongoing effort and multiple changes have been done over time to bring the project closer to the usage of new approach while maintaining backward compatibility.
Datahub populates the list of OwnershipTypes during the bootstrap, this PR introduces possibility to define user's own list of ownership types to bootstrap (we found this approach the best to define semi-static list fitting our custom purposes).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. We, for example, define multiple types of validity ( "is this zip code valid" and "is this value valid according to our business rules").

Comment on lines +42 to +43
PERCENTAGE
NUMERICAL_VALUE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am by no means an expert on the subject so forgive my ignorance, but how do you map a numerical value to the "accuracy" and "completeness"? Or would a boolean type be a good addition here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community-contribution PR or Issue raised by member(s) of DataHub Community poc-marathon-dec-2023 product PR or Issue related to the DataHub UI/UX
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants