Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Add Row Level Result Treatment Options for Miminum and Maximum #535

Merged
merged 7 commits into from
Feb 21, 2024

Conversation

eycho-am
Copy link
Contributor

@eycho-am eycho-am commented Feb 20, 2024

Issue #, if available: N/A

Description of changes:

  • Follow-up to PR Feature: Add Row Level Result Treatment Options for Uniqueness and Completeness #532
  • Adding AnalyzerOptions to use FilteredRowOutcome for Minimum and Maximum analyzers
  • Adding AnalyzerOptions to use FilteredRowOutcome for the Compliance analyzer
  • Adding AnalyzerOptions to use FilteredRowOutcome for the PatternMatch analyzer
  • Adding AnalyzerOptions to use FilteredRowOutcome for MinLength and MaxLength analyzers
    • Note: Refactored NullBehavior AnalyzerOptions logic in Min and Max Length analyzers to work with FilteredRowOutcome

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@eycho-am eycho-am changed the title Feature/row level filter min max Feature: Add Row Level Result Treatment Options for Miminum and Maximum Feb 20, 2024
Copy link
Contributor

@rdsharma26 rdsharma26 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, @eycho-am !

Comment on lines 77 to 78
case _ =>
criterion
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a comment here that we don't need special treatment for Null because that is the default behavior anyway when using where

}
}

private def transformColForNullBehavior(col: Column, nullBehavior: NullBehavior): Column = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use column here directly instead of passing it in ascol ? In rowLevelResults, we use analyzerOptions directly instead of passing it in as a parameter.

private[deequ] def rowLevelResults: Column = {
val filteredRowOutcome = getRowLevelFilterTreatment(analyzerOptions)
val whereNotCondition = where.map { expression => not(expr(expression)) }
val expression = when(regexp_extract(col(column), pattern.toString(), 0) =!= lit(""), 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we put when(regexp_extract(col(column), pattern.toString(), 0) =!= lit(""), 1) in a common place and reuse it in criterion and here?

src/main/scala/com/amazon/deequ/checks/Check.scala Outdated Show resolved Hide resolved
Copy link
Contributor

@rdsharma26 rdsharma26 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@eycho-am eycho-am merged commit 44df61d into awslabs:master Feb 21, 2024
1 check passed
eycho-am added a commit that referenced this pull request Feb 21, 2024
…um (#535)

* Address comments on PR #532

* Add filtered row-level result support for Minimum, Maximum, Compliance, PatternMatch, MinLength, MaxLength analyzers

* Refactored criterion for MinLength and MaxLength analyzers to separate rowLevelResults logic
eycho-am added a commit that referenced this pull request Feb 22, 2024
…um (#535)

* Address comments on PR #532

* Add filtered row-level result support for Minimum, Maximum, Compliance, PatternMatch, MinLength, MaxLength analyzers

* Refactored criterion for MinLength and MaxLength analyzers to separate rowLevelResults logic
rdsharma26 pushed a commit that referenced this pull request Apr 16, 2024
…um (#535)

* Address comments on PR #532

* Add filtered row-level result support for Minimum, Maximum, Compliance, PatternMatch, MinLength, MaxLength analyzers

* Refactored criterion for MinLength and MaxLength analyzers to separate rowLevelResults logic
rdsharma26 pushed a commit that referenced this pull request Apr 16, 2024
…um (#535)

* Address comments on PR #532

* Add filtered row-level result support for Minimum, Maximum, Compliance, PatternMatch, MinLength, MaxLength analyzers

* Refactored criterion for MinLength and MaxLength analyzers to separate rowLevelResults logic
rdsharma26 pushed a commit that referenced this pull request Apr 16, 2024
…um (#535)

* Address comments on PR #532

* Add filtered row-level result support for Minimum, Maximum, Compliance, PatternMatch, MinLength, MaxLength analyzers

* Refactored criterion for MinLength and MaxLength analyzers to separate rowLevelResults logic
rdsharma26 pushed a commit that referenced this pull request Apr 17, 2024
…um (#535)

* Address comments on PR #532

* Add filtered row-level result support for Minimum, Maximum, Compliance, PatternMatch, MinLength, MaxLength analyzers

* Refactored criterion for MinLength and MaxLength analyzers to separate rowLevelResults logic
rdsharma26 pushed a commit that referenced this pull request Apr 17, 2024
…um (#535)

* Address comments on PR #532

* Add filtered row-level result support for Minimum, Maximum, Compliance, PatternMatch, MinLength, MaxLength analyzers

* Refactored criterion for MinLength and MaxLength analyzers to separate rowLevelResults logic
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants