-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(ingestion/kafka): add description in dataset properties #7974
feat(ingestion/kafka): add description in dataset properties #7974
Conversation
add description in dataset properties as top-level doc if schema type avro
@@ -175,7 +175,7 @@ def test_kafka_source_workunits_no_platform_instance(mock_kafka, mock_admin_clie | |||
env="PROD", | |||
) | |||
|
|||
# DataPlatform aspect should be present when platform_instance is configured | |||
# DataPlatform aspect should not be present when platform_instance is configured |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't think this change is correct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, The comment should be as below:
"DataPlatform aspect should not be present when platform_instance is not configured."
right?
As we are testing kafka source workunits with no platform instance here.
/** | ||
* The native kafka key schema type. This can be AVRO/PROTOBUF/JSON. | ||
*/ | ||
keySchemaType: optional string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what was the motivation for adding these two fields? what other alternatives did you consider?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have two reasons for adding these two fields:
- More metadata clarification: If any new user see the kafka ingested metadata, he/she will not be able know exactly which type of schema is associated with topic. As I am new to this, even I felt the same.
- To set top-level doc field: The task was to set top-level doc field as description of dataset only if schema type is AVRO. So as we are generating dataset properties at outer function i.e. _extract_records, we will need schema type at outer function for adding condition. Hence I added those fields.
Alternatives:
We can have separated functions as get_description() in kafka.py. But this will lead to adding same code and calling same metadata fetching APIs again.
# Point to note: | ||
# documentSchema and keySchema both can have the doc, however we are retrieving doc | ||
# from documentSchema and setting it as dataset description. | ||
# doc is optional property in both i.e. documentSchema and keySchema |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please make this comment more concise / clear
@shubhamjagtap639 also it looks like the tests are failing |
…ub.com:shubhamjagtap639/datahub into kafka-dataset-properties-populate-descriptions
Merge latest code
Checklist