Replies: 3 comments 5 replies
-
|
Row checks should perform well at that scale. Are you applying any dataset level checks or using any custom SQL (e.g. for SQL expression checks or filter clauses)? |
Beta Was this translation helpful? Give feedback.
-
|
Hi,
Upon looking at one the contracts, this is the layout:
Function name Number of instances
is_in_list 64
sql_expression 43
is_unique 2
…On Mon, 4 May 2026 at 22:09, Greg Hansen ***@***.***> wrote:
Row checks should perform well at that scale. Are you applying any dataset
level checks or using any custom SQL (e.g. for SQL expression checks or
filter clauses)?
—
Reply to this email directly, view it on GitHub
<#1137 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABQDEVOLIIM3LPOTYF64EUT4ZEBJJAVCNFSM6AAAAACYKYAQ2GVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTMOBQHE4TCNA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
|
Hi there, jumping in to add some additional context to this issue. In the sample run:
In the current iteration of the workflow, we split the 131 rules into 21 cross field rules that are applied by DQX and
We apply the DQX checks with the following Additionally, we notice significant performance degradation writing the results of DQX to a delta quarantine table. Is there any best practice or suggestions of writing DQX results to a delta table? Is there any additional instrumentation available to break down planning/compilation vs execution time inside DQX? Please let me know if any additional metrics/information would be helpful. Thank very much in advance! |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
We have an interesting demand put on DQX where a Data Contract generates over 150 rules. The performance one would expect to take a hit but when we start to compare against spark equivalent checks, DQX really starts to lag and take excessive time to process what can be considered a rather mediocre size dataset. Typical rule challenges are around the function "is_in_range". Are there any recommendations or guidance about dealing with so many rules?
Beta Was this translation helpful? Give feedback.
All reactions