Skip to content

Store dags cleanup 1 10#8764

Closed
anitakar wants to merge 1 commit intoapache:v1-10-testfrom
anitakar:store_dags_cleanup_1_10
Closed

Store dags cleanup 1 10#8764
anitakar wants to merge 1 commit intoapache:v1-10-testfrom
anitakar:store_dags_cleanup_1_10

Conversation

@anitakar
Copy link
Contributor

@anitakar anitakar commented May 7, 2020


Make sure to mark the boxes below before creating PR: [x]

  • Description above provides context of the change
  • Unit tests coverage for changes (not needed for documentation changes)
  • Target Github ISSUE in description if exists
  • Commits follow "How to write a good git commit message"
  • Relevant documentation is updated including usage instructions.
  • I will engage committers as explained in Contribution Workflow Example.

In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.
Read the Pull Request Guidelines for more information.

@boring-cyborg boring-cyborg bot added area:dev-tools area:Scheduler including HA (high availability) scheduler area:serialization area:webserver Webserver related Issues k8s labels May 7, 2020
@anitakar anitakar force-pushed the store_dags_cleanup_1_10 branch 2 times, most recently from 8614c32 to cf2929c Compare May 7, 2020 11:15
Added missing store_serialized_dags to the _DagBag class.

Read store_serialized_dags from config

This value was read once when Airflow (scheduler, worker, web server) was
started. We want web server (which is long runnning process) to read
this value afresh every time.

Only access dags from db not disk in a webserver

Scheduler stopped parsing dags because
DagBag::collect_dags method that it uses, returned no dags, as it does
not collect dags when stored_serialized_dags is on.
Generally, even though store_serialized_dags parameter is used in
models and throughout common code, it should only be set to True in web
server.
If it is set to true in other parts of the system, they will stop
working.
CLI will stop working.
Scheduler is the element that creates serialized dags and it should
always read from disk to keep other parts of the system up to date.
But there is no way to check whether dag_bag was called by web server and
store_serialized_dags should be applied or if it was called from
scheduler and dags should be read from files on the disk not from the db.
@anitakar anitakar force-pushed the store_dags_cleanup_1_10 branch from cf2929c to 7f7dc0f Compare May 7, 2020 11:43
@ashb ashb removed area:Scheduler including HA (high availability) scheduler area:webserver Webserver related Issues area:dev-tools k8s labels May 7, 2020
@ashb
Copy link
Member

ashb commented May 7, 2020

I'm going to close this, as the main feature here is not needed with Dag serialization on, and the other changes are a performance regression.

We can re-open this is you disagree.

@ashb ashb closed this May 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants