-
Notifications
You must be signed in to change notification settings - Fork 16.2k
Closed
Labels
Description
Apache Airflow Provider(s)
common-sql
Versions of Apache Airflow Providers
apache-airflow-providers-common-sql==1.0.0
Apache Airflow version
2.3.3 (latest released)
Operating System
macOS Monterey 12.3.1
Deployment
Astronomer
Deployment details
No response
What happened
SQLTableCheckOperatorbuilds multiple checks in such a way that if two or more checks are given, and one is not a fully aggregated statement, then the SQL fails as it is missing aGROUP BYclause.SQLColumnCheckOperatorprovides only the last SQL query built from the columns, so when a check fails, it will only give the correct SQL in the exception statement by coincidence.
What you think should happen instead
- Multiple checks should not need a
GROUP BYclause - Either the correct SQL statement, or no SQL statement, should be returned in the exception message.
How to reproduce
For the SQLTableCheckOperator, using the operator like so:
table_cheforestfire_costs_table_checkscks = SQLTableCheckOperator(
task_id="forestfire_costs_table_checks",
table=SNOWFLAKE_FORESTFIRE_COST_TABLE,
checks={
"row_count_check": {"check_statement": "COUNT(*) = 9"},
"total_cost_check": {"check_statement": "land_damage_cost + property_damage_cost + lost_profits_cost = total_cost"}
}
)
For the SQLColumnCheckOperator, using the operator like so:
cost_column_checks = SQLColumnCheckOperator(
task_id="cost_column_checks",
table=SNOWFLAKE_COST_TABLE,
column_mapping={
"ID": {"null_check": {"equal_to": 0}},
"LAND_DAMAGE_COST": {"min": {"geq_to": 0}},
"PROPERTY_DAMAGE_COST": {"min": {"geq_to": 0}},
"LOST_PROFITS_COST": {"min": {"geq_to": 0}},
}
)
and ensuring that any of the ID, LAND_DAMAGE_COST, or PROPERTY_DAMAGE_COST checks fail.
An example DAG with the correct environment and data can be found here.
Anything else
No response
Are you willing to submit PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project's Code of Conduct