Skip to content

Feature/data 2216 create soda runner#49

Merged
raimundovidaljunior merged 3 commits intomasterfrom
feature/DATA-2216-create-soda-runner
Mar 25, 2025
Merged

Feature/data 2216 create soda runner#49
raimundovidaljunior merged 3 commits intomasterfrom
feature/DATA-2216-create-soda-runner

Conversation

@raimundovidaljunior
Copy link
Copy Markdown

@raimundovidaljunior raimundovidaljunior commented Mar 19, 2025

DATA-2216

The new Soda operator is inherited from the batch operator with automatically filling out some of the batch parameters, like job name, as well as some soda parameters like soda scan table, project dir, etc

Creating new Soda operator
Adding soda configurations to dagger conf

@raimundovidaljunior raimundovidaljunior requested a review from a team as a code owner March 19, 2025 13:54
@claudiazi claudiazi requested a review from Copilot March 19, 2025 16:17
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a Soda operator to run Soda scans by extending the existing batch functionality. Key changes include the addition of a new SodaBatchOperator, a corresponding SodaTask to drive execution with Soda‑specific parameters, and a SodaCreator to generate operator commands; configuration updates and factory registrations complete the integration.

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated no comments.

Show a summary per file
File Description
dagger/dag_creator/airflow/operators/soda_batch.py New SodaBatchOperator inheriting from AWSBatchOperator
dagger/pipeline/tasks/soda_task.py New SodaTask with Soda-specific configuration attributes and error handling
dagger/dag_creator/airflow/operator_creators/soda_creator.py New SodaCreator generating command arguments for Soda tasks
dagger/conf.py Added Soda configuration defaults
dagger/dagger_config.yaml Soda section added with commented default parameters
dagger/pipeline/task_factory.py Registered soda_task in the task factory
dagger/dag_creator/airflow/operator_factory.py Registered soda_creator in the operator factory

Comment thread dagger/pipeline/tasks/soda_task.py Outdated
attribute_name="target_name",
parent_fields=["task_parameters"],
validator=str,
required=True,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this required?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch! I will change it

required=True,
comment="Target to load for the given profile. By default use 'ENV' environment variable.",
),
Attribute(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need this? could we not use the databricks/athena input from the yaml config?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just needed a way to differentiate dbt scans from table scans, the way I thought about it was the mutually exclusive params model_name and table_name. I guess we could have a is_dbt_model or something and use the input from the yaml config as well, do you think it works better this way?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at the end of the day, a dbt model is a table. is there any way that we could consolidate this on the dagger level?

for example, if there is a profile dir, then we know its a dbt model, and can use the model name. otherwise we just use the dagger output(or something along these lines)

@kiranvasudev
Copy link
Copy Markdown
Member

@raimundovidaljunior could you please add some tests?

Copy link
Copy Markdown

@claudiazi claudiazi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@raimundovidaljunior raimundovidaljunior merged commit 1c7802d into master Mar 25, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants