(enhancement): Extend get and update ruleset DQ methods #1882

jaidisido · 2022-12-16T17:12:05Z

Feature or Bugfix

Enhancement

Detail

get_ruleset should accept multiple rulesets and combine them to a data frame
At the moment update_ruleset just overwrites the existing ruleset. Introducing a mode argument (overwrite, upsert) to handle upserts
Wrap get methods in try_it to handle request throttling
Add tests

Relates

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

jaidisido · 2022-12-16T17:13:05Z

awswrangler/data_quality/_create.py

@@ -134,6 +136,8 @@ def update_ruleset(
        Ruleset name.
    updated_name : str
        New ruleset name if renaming an existing ruleset.
+    mode : str
+        overwrite (default) or upsert.


overwrite and upsert are the only modes I can think of

How about append?

jaidisido · 2022-12-16T17:13:37Z

awswrangler/data_quality/_create.py

+    if mode not in ["overwrite", "upsert"]:
+        raise exceptions.InvalidArgumentValue("`mode` must be one of 'overwrite' or 'upsert'.")
+
+    if mode == "upsert":


Could not find a better way to do an upsert in pandas

jaidisido · 2022-12-16T17:14:24Z

awswrangler/data_quality/_get.py

+    for ruleset_name in ruleset_names:
+        rules = cast(str, _get_ruleset(ruleset_name=ruleset_name, boto3_session=boto3_session)["Ruleset"])
+        df = _rules_to_df(rules=rules)
+        if len(ruleset_names) > 1:


I think a column with the ruleset name should only be added if there is multiple ones. But let me know if you disagree

What's the major downside to adding the ruleset name always? I like the idea of the structure of the DataFrame to be consistent (i.e. it should either always have ruleset).

LeonLuttenberger · 2022-12-16T17:24:58Z

awswrangler/data_quality/_get.py

+    for ruleset_name in ruleset_names:
+        rules = cast(str, _get_ruleset(ruleset_name=ruleset_name, boto3_session=boto3_session)["Ruleset"])
+        df = _rules_to_df(rules=rules)
+        if len(ruleset_names) > 1:


What's the major downside to adding the ruleset name always? I like the idea of the structure of the DataFrame to be consistent (i.e. it should either always have ruleset).

malachi-constant · 2022-12-16T17:35:24Z

AWS CodeBuild CI Report

CodeBuild project: GitHubCodeBuild8756EF16-4rfo0GHQ0u9a
Commit ID: 605be51
Result: FAILED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

malachi-constant · 2022-12-16T18:15:49Z

AWS CodeBuild CI Report

CodeBuild project: GitHubCodeBuild8756EF16-4rfo0GHQ0u9a
Commit ID: 82cd555
Result: FAILED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

malachi-constant · 2022-12-19T16:35:05Z

AWS CodeBuild CI Report

CodeBuild project: GitHubCodeBuild8756EF16-4rfo0GHQ0u9a
Commit ID: 516637a
Result: FAILED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

(enhancement): Extend get and update ruleset DQ methods

605be51

jaidisido self-assigned this Dec 16, 2022

jaidisido requested review from cnfait, kukushking, LeonLuttenberger and malachi-constant December 16, 2022 17:12

jaidisido commented Dec 16, 2022

View reviewed changes

LeonLuttenberger approved these changes Dec 16, 2022

View reviewed changes

Merge branch 'main' into enhancement/extend-dq-get-update

82cd555

Various fixes

516637a

jaidisido merged commit 93696bb into main Dec 19, 2022

jaidisido deleted the enhancement/extend-dq-get-update branch December 19, 2022 16:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(enhancement): Extend get and update ruleset DQ methods #1882

(enhancement): Extend get and update ruleset DQ methods #1882

jaidisido commented Dec 16, 2022

jaidisido Dec 16, 2022

kukushking Dec 27, 2022

jaidisido Dec 16, 2022

jaidisido Dec 16, 2022

LeonLuttenberger Dec 16, 2022

LeonLuttenberger Dec 16, 2022

malachi-constant commented Dec 16, 2022

malachi-constant commented Dec 16, 2022

malachi-constant commented Dec 19, 2022

(enhancement): Extend get and update ruleset DQ methods #1882

(enhancement): Extend get and update ruleset DQ methods #1882

Conversation

jaidisido commented Dec 16, 2022

Feature or Bugfix

Detail

Relates

jaidisido Dec 16, 2022

Choose a reason for hiding this comment

kukushking Dec 27, 2022

Choose a reason for hiding this comment

jaidisido Dec 16, 2022

Choose a reason for hiding this comment

jaidisido Dec 16, 2022

Choose a reason for hiding this comment

LeonLuttenberger Dec 16, 2022

Choose a reason for hiding this comment

LeonLuttenberger Dec 16, 2022

Choose a reason for hiding this comment

malachi-constant commented Dec 16, 2022

AWS CodeBuild CI Report

malachi-constant commented Dec 16, 2022

AWS CodeBuild CI Report

malachi-constant commented Dec 19, 2022

AWS CodeBuild CI Report