Feature/data 2216 create soda runner#49
Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR introduces a Soda operator to run Soda scans by extending the existing batch functionality. Key changes include the addition of a new SodaBatchOperator, a corresponding SodaTask to drive execution with Soda‑specific parameters, and a SodaCreator to generate operator commands; configuration updates and factory registrations complete the integration.
Reviewed Changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| dagger/dag_creator/airflow/operators/soda_batch.py | New SodaBatchOperator inheriting from AWSBatchOperator |
| dagger/pipeline/tasks/soda_task.py | New SodaTask with Soda-specific configuration attributes and error handling |
| dagger/dag_creator/airflow/operator_creators/soda_creator.py | New SodaCreator generating command arguments for Soda tasks |
| dagger/conf.py | Added Soda configuration defaults |
| dagger/dagger_config.yaml | Soda section added with commented default parameters |
| dagger/pipeline/task_factory.py | Registered soda_task in the task factory |
| dagger/dag_creator/airflow/operator_factory.py | Registered soda_creator in the operator factory |
| attribute_name="target_name", | ||
| parent_fields=["task_parameters"], | ||
| validator=str, | ||
| required=True, |
There was a problem hiding this comment.
good catch! I will change it
| required=True, | ||
| comment="Target to load for the given profile. By default use 'ENV' environment variable.", | ||
| ), | ||
| Attribute( |
There was a problem hiding this comment.
why do we need this? could we not use the databricks/athena input from the yaml config?
There was a problem hiding this comment.
I just needed a way to differentiate dbt scans from table scans, the way I thought about it was the mutually exclusive params model_name and table_name. I guess we could have a is_dbt_model or something and use the input from the yaml config as well, do you think it works better this way?
There was a problem hiding this comment.
at the end of the day, a dbt model is a table. is there any way that we could consolidate this on the dagger level?
for example, if there is a profile dir, then we know its a dbt model, and can use the model name. otherwise we just use the dagger output(or something along these lines)
|
@raimundovidaljunior could you please add some tests? |
DATA-2216
The new Soda operator is inherited from the batch operator with automatically filling out some of the batch parameters, like job name, as well as some soda parameters like soda scan table, project dir, etc
Creating new Soda operator
Adding soda configurations to dagger conf