Feature/data 2216 create soda runner by raimundovidaljunior · Pull Request #49 · chocoapp/dataeng-dagger

raimundovidaljunior · 2025-03-19T13:54:22Z

The new Soda operator is inherited from the batch operator with automatically filling out some of the batch parameters, like job name, as well as some soda parameters like soda scan table, project dir, etc

Creating new Soda operator
Adding soda configurations to dagger conf

Copilot

Pull Request Overview

This PR introduces a Soda operator to run Soda scans by extending the existing batch functionality. Key changes include the addition of a new SodaBatchOperator, a corresponding SodaTask to drive execution with Soda‑specific parameters, and a SodaCreator to generate operator commands; configuration updates and factory registrations complete the integration.

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
dagger/dag_creator/airflow/operators/soda_batch.py	New SodaBatchOperator inheriting from AWSBatchOperator
dagger/pipeline/tasks/soda_task.py	New SodaTask with Soda-specific configuration attributes and error handling
dagger/dag_creator/airflow/operator_creators/soda_creator.py	New SodaCreator generating command arguments for Soda tasks
dagger/conf.py	Added Soda configuration defaults
dagger/dagger_config.yaml	Soda section added with commented default parameters
dagger/pipeline/task_factory.py	Registered soda_task in the task factory
dagger/dag_creator/airflow/operator_factory.py	Registered soda_creator in the operator factory

kiranvasudev · 2025-03-20T06:50:34Z

+                    attribute_name="target_name",
+                    parent_fields=["task_parameters"],
+                    validator=str,
+                    required=True,


why is this required?

good catch! I will change it

kiranvasudev · 2025-03-20T06:53:12Z

+                    required=True,
+                    comment="Target to load for the given profile. By default use 'ENV' environment variable.",
+                ),
+                Attribute(


why do we need this? could we not use the databricks/athena input from the yaml config?

I just needed a way to differentiate dbt scans from table scans, the way I thought about it was the mutually exclusive params model_name and table_name. I guess we could have a is_dbt_model or something and use the input from the yaml config as well, do you think it works better this way?

at the end of the day, a dbt model is a table. is there any way that we could consolidate this on the dagger level?

for example, if there is a profile dir, then we know its a dbt model, and can use the model name. otherwise we just use the dagger output(or something along these lines)

kiranvasudev · 2025-03-20T06:56:46Z

@raimundovidaljunior could you please add some tests?

claudiazi

lgtm!

raimundovidaljunior added 2 commits March 18, 2025 17:19

Create soda runner

13ccf9e

Fixing configs

2f7cc55

raimundovidaljunior requested a review from a team as a code owner March 19, 2025 13:54

pull-request-size Bot added the size/L label Mar 19, 2025

claudiazi requested a review from Copilot March 19, 2025 16:17

Copilot AI reviewed Mar 19, 2025

View reviewed changes

kiranvasudev reviewed Mar 20, 2025

View reviewed changes

Remove unnecessary vars

47f4d83

raimundovidaljunior requested a review from claudiazi March 25, 2025 10:09

claudiazi approved these changes Mar 25, 2025

View reviewed changes

raimundovidaljunior merged commit 1c7802d into master Mar 25, 2025
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/data 2216 create soda runner#49

Feature/data 2216 create soda runner#49
raimundovidaljunior merged 3 commits intomasterfrom
feature/DATA-2216-create-soda-runner

raimundovidaljunior commented Mar 19, 2025 •

edited by atlassian Bot

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

kiranvasudev Mar 20, 2025

Uh oh!

raimundovidaljunior Mar 20, 2025

Uh oh!

kiranvasudev Mar 20, 2025

Uh oh!

raimundovidaljunior Mar 20, 2025

Uh oh!

kiranvasudev Mar 21, 2025

Uh oh!

kiranvasudev commented Mar 20, 2025

Uh oh!

claudiazi left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

raimundovidaljunior commented Mar 19, 2025 • edited by atlassian Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

kiranvasudev Mar 20, 2025

Choose a reason for hiding this comment

Uh oh!

raimundovidaljunior Mar 20, 2025

Choose a reason for hiding this comment

Uh oh!

kiranvasudev Mar 20, 2025

Choose a reason for hiding this comment

Uh oh!

raimundovidaljunior Mar 20, 2025

Choose a reason for hiding this comment

Uh oh!

kiranvasudev Mar 21, 2025

Choose a reason for hiding this comment

Uh oh!

kiranvasudev commented Mar 20, 2025

Uh oh!

claudiazi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

raimundovidaljunior commented Mar 19, 2025 •

edited by atlassian Bot

Loading