Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

100+ DAGs fail to import after update to 2.7.1 due to "params without default values" #34227

Closed
1 of 2 tasks
ldacey opened this issue Sep 8, 2023 · 4 comments · Fixed by #34248
Closed
1 of 2 tasks
Assignees
Labels
area:core kind:bug This is a clearly a bug
Milestone

Comments

@ldacey
Copy link
Contributor

ldacey commented Sep 8, 2023

Apache Airflow version

2.7.1

What happened

120 DAGs broke when I updated to Airflow 2.7.1. I had to revert to 2.7.0 again, so the tests below are from a local deployment.

The error message is the same for all DAGs:

Broken DAG: [/opt/airflow/dags/dag-name.py] Traceback (most recent call last):
  File "/home/airflow/.local/lib/python3.11/site-packages/airflow/models/dag.py", line 640, in __init__
    self.validate_schedule_and_params()
  File "/home/airflow/.local/lib/python3.11/site-packages/airflow/models/dag.py", line 3289, in validate_schedule_and_params
    raise AirflowException(
airflow.exceptions.AirflowException: DAG Schedule must be None, if there are any required params without default values

After testing, I found that if I change the blob_prefix param to "" instead of None, the DAG imports successfully. I have various other params which are None that I will probably need to assign default values for (in reality, the function which Airflow runs handles cases if the value is None, but I guess I need to add a default in the DAG). All 120 DAGs with errors had a field like this in the "params" dictionary I pass to Airflow params.

params = {
    "exclude_pattern": "*errors*",
    "blob_prefix": None,

What you think should happen instead

These are legacy DAGs which have run for a long time without changes (old Airflow instance), so I did not expect an upgrade to break them.

How to reproduce

I believe the root cause is passing a param with the value None to the DAG params:

params = {
  subfolder=None,
  conn_id="gcs_conn_id",
}

@dag(
    dag_id="dag-example",
    start_date=datetime(2023, 2, 2, tz="America/New_York"),
    schedule="41 2 * * *",
    params=params,
)

Operating System

Ubuntu 22.04

Versions of Apache Airflow Providers

Providers info
apache-airflow-providers-celery | 3.3.3
apache-airflow-providers-common-sql | 1.7.1
apache-airflow-providers-docker | 3.7.4
apache-airflow-providers-ftp | 3.5.1
apache-airflow-providers-google | 10.7.0
apache-airflow-providers-http | 4.5.1
apache-airflow-providers-imap | 3.3.1
apache-airflow-providers-microsoft-azure | 6.3.0
apache-airflow-providers-mysql | 5.3.0
apache-airflow-providers-odbc | 4.0.0
apache-airflow-providers-openlineage | 1.0.2
apache-airflow-providers-postgres | 5.6.0
apache-airflow-providers-redis | 3.3.1
apache-airflow-providers-sftp | 4.6.0
apache-airflow-providers-sqlite | 3.4.3
apache-airflow-providers-ssh | 3.7.2

Deployment

Other Docker-based deployment

Deployment details

Docker Swarm using docker stack deploy

Anything else

DAGs fail to import when deployed on 2.7.1

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@ldacey ldacey added area:core kind:bug This is a clearly a bug needs-triage label for new issues that we didn't triage yet labels Sep 8, 2023
@potiuk potiuk added good first issue and removed needs-triage label for new issues that we didn't triage yet good first issue labels Sep 9, 2023
@potiuk potiuk added this to the Airflow 2.7.2 milestone Sep 9, 2023
@yermalov-here
Copy link
Contributor

looks like it was explicitly changed here #33141

@potiuk
Copy link
Member

potiuk commented Sep 9, 2023

Yes. I think the behaviour when param value was set to None has not been defined. And to be honest i am not sure we should interpret it as 'No default".. The None being default value seems to be pretty standard.

@jens-scheffler-bosch @hussein-awala @uranusjr -> WDYT?

@hussein-awala
Copy link
Member

Yes. I think the behaviour when param value was set to None has not been defined. And to be honest i am not sure we should interpret it as 'No default".. The None being default value seems to be pretty standard.

IMHO we should consider the params defined without a default value as required params, and the fail/exception should be raised when trying to create a DagRun without providing these required params.

@jscheffl
Copy link
Contributor

jscheffl commented Sep 9, 2023

WRONG: I assume currently it is caused by an inconsistency. Because the check assumes to have a Param object but this is not enforced. In the trigger UI there is some wrapping being made to auto-detect typed making a Param object out of standard python data. But not when parsing a DAG.
CORRECT: Ups, I now realized that the ParamsDict makes a kind of auto-boxing so the problem really might be related to #33141 .

None as default value is in my eyes totally right and as the bug is reported this is a valid use case. Also relates to my (still pending revision and open) PR #31301 . In my eyes we should not enforce users to add values for params, a Nonevalue might be valid. In such case also an auto-boxing should not make a parameter required.
UPDATE: I would propose to follow the JSON Schema definitions. If type is defined, then the type is validated. If type is not defined (which is the case when None is auto-boxed into a Param class) then the type is undefined and schema check shall allow any value, which includes null. I just look through the JSON schema specs but can not find how content is handled if typeis not defined...
In abstract still I would refer to the higher level statement: If params are valid, then the DAG can be triggered by a schedule.

I am currently looking though the history of the code, understanding when the problem has been introduced. I volunteer for a fix but want to understand the history first :-D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:core kind:bug This is a clearly a bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants