You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
The query DQ feature does provide output in terms of boolean vales. The boolean value of FALSE does inform us that there is something wrong in the query validation results but user won't knew what went wrong. They have to rerun the query manually to figure out the difference. If it is production support team, it is highly unlikely that they are aware of the validation scripts. It becomes tough to get actionable insights out of query DQ feature. If the query results of both the source and target is fetched and stored in a custom stats table, it would be useful for users to build actionable insights or work items based on the results
Describe the solution you'd like
Right now, it is programmed to pass one query to the QueryDQ. Instead, we can pass three queries as below.
select X from table1; select Y from table2; select x=y from t1 join t2
Queries are separated by semi colons. If it is one query, it is the default behaviour and for three, the behaviour is as below.
X and Y are values to be compared between source and target respectively.
Third query is the validation query. If the validation is FALSE, then we can fetch the X and Y values and store it as JSON in a custom stats table. The custom table is user managed and should be passed as the argument as below.
SparkExpectations(custom_dq_info_table = ""...)
Describe alternatives you've considered
We are right now implementing the above option as a separate module and use it along with other features of SparkExpectation
Additional context
The custom table is user managed. Permissions and other stuffs have to be handled by the user.
Number of rows could be restricted to 200 initially for the records to be stored in the custom stats table.
The text was updated successfully, but these errors were encountered:
Connected with @asingamaneni and @jskrajareddy21 on this enhancement. Below are the bug fixes and feature request coupled to this enhancement request
The querydq should execute as is with multiple delimited queries.
As there is an ask to send the details of the detailed stats table to Kafka, If any data is stored in the detailed stats table, it has to be masked before sending to Kafka.
There should not be any limitation on the number of delimited query_dq queries.
Handle edge cases where the one of the delimited query_dq query can be a int or float.
Is your feature request related to a problem? Please describe.
The query DQ feature does provide output in terms of boolean vales. The boolean value of FALSE does inform us that there is something wrong in the query validation results but user won't knew what went wrong. They have to rerun the query manually to figure out the difference. If it is production support team, it is highly unlikely that they are aware of the validation scripts. It becomes tough to get actionable insights out of query DQ feature. If the query results of both the source and target is fetched and stored in a custom stats table, it would be useful for users to build actionable insights or work items based on the results
Describe the solution you'd like
Right now, it is programmed to pass one query to the QueryDQ. Instead, we can pass three queries as below.
select X from table1; select Y from table2; select x=y from t1 join t2
Queries are separated by semi colons. If it is one query, it is the default behaviour and for three, the behaviour is as below.
X and Y are values to be compared between source and target respectively.
Third query is the validation query. If the validation is FALSE, then we can fetch the X and Y values and store it as JSON in a custom stats table. The custom table is user managed and should be passed as the argument as below.
SparkExpectations(custom_dq_info_table = ""...)
Describe alternatives you've considered
We are right now implementing the above option as a separate module and use it along with other features of SparkExpectation
Additional context
The custom table is user managed. Permissions and other stuffs have to be handled by the user.
Number of rows could be restricted to 200 initially for the records to be stored in the custom stats table.
The text was updated successfully, but these errors were encountered: