Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Triggering DAG from UI Creates 2 DAG runs if the DAG was paused #24534

Open
1 of 2 tasks
RNHTTR opened this issue Jun 17, 2022 · 22 comments
Open
1 of 2 tasks

Triggering DAG from UI Creates 2 DAG runs if the DAG was paused #24534

RNHTTR opened this issue Jun 17, 2022 · 22 comments
Assignees
Labels
affected_version:2.3 Issues Reported for 2.3 area:core good first issue kind:bug This is a clearly a bug

Comments

@RNHTTR
Copy link
Collaborator

RNHTTR commented Jun 17, 2022

Apache Airflow version

2.3.2 (latest released)

What happened

If a DAG with a schedule_interval set to @daily is paused, and the DAG is triggered by clicking the "play" button --> "Trigger DAG", two DAG runs will simultaneously be triggered: One for the current data interval and one for the manual run.

image

image

What you think should happen instead

Maybe an alert with confirmation should pop up and inform the user that the DAG run will be for the current data interval rather than the current datetime?

How to reproduce

  1. Pause a DAG that hasn't run in the current data interval.
  2. Trigger the DAG by clicking the "play" button then clicking "Trigger DAG"

Operating System

Debian 11

Versions of Apache Airflow Providers

n/a

Deployment

Astronomer

Deployment details

No response

Anything else

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@RNHTTR RNHTTR added area:core kind:bug This is a clearly a bug labels Jun 17, 2022
@potiuk
Copy link
Member

potiuk commented Jun 19, 2022

Interesting edge-case. Would you like to take a stab at it @RNHTTR ?

@RNHTTR
Copy link
Collaborator Author

RNHTTR commented Jun 23, 2022

I will circle back in a couple weeks, but I don't have time at the moment.

@eladkal eladkal added the affected_version:2.3 Issues Reported for 2.3 label Jul 5, 2022
@bispaul
Copy link

bispaul commented Jul 10, 2022

Hi @potiuk , Can I help out on this? I probably will need just a little guidance on getting started.

@rohit-mobstac
Copy link

Is this issue solved? I can see the same issue happening in AWS MWAA as well.

@potiuk
Copy link
Member

potiuk commented Sep 13, 2022

I think it would have been closed by now if it was. But if you want to take a stab and attempt to fix it @rohit-mobstac - feel free. @bispaul -> I have missed your comment before, but I think I have no specific guidance on it, this just need diagnosis why it happens and fixing, just regular contribution workflow :)

@potiuk
Copy link
Member

potiuk commented Sep 13, 2022

And just to explain - I have no idea why it happens, and 95% of solving this issue is tracking why it happened. Fixing it will be probably one-liner, so any guidance that I could spare would be equal to fixing the problem. I have other pressing issues and this one is more suited for someone who might want to take first steps into contributing to Airflow, as this is at most annoying edge, case, nothign super important to fix immediately and it can wait for someone to pick it up (this is why it is marked as "good first issue").

@rohit-mobstac
Copy link

This is not a fix, but something I have noticed. Unpausing a scheduled DAG will automatically kickstart the dag run. Maybe you do not have to explicitly trigger the paused dag and just enable the dag. @RNHTTR
Seems like a good issue. Will look into this

@bispaul
Copy link

bispaul commented Sep 13, 2022

Hi @potiuk please assign it to me.

@potiuk
Copy link
Member

potiuk commented Sep 13, 2022

Just did :)

@RNHTTR
Copy link
Collaborator Author

RNHTTR commented Sep 13, 2022

@rohit-mobstac If DAG is turned on within the DAG's data interval, it will trigger a DAG run when you unpause the DAG, which I think is to be expected. I think there should be some kind of logic that blocks that scheduled DAG run to run if you're triggering it from the UI. Or there should at least be the option to block it.

@eladkal
Copy link
Contributor

eladkal commented Jan 11, 2023

@bispaul are you still working on this issue?

@bispaul
Copy link

bispaul commented Jan 11, 2023

Hi @eladkal , I am working on it unfortunately I haven't been able to replicate the issue. Can anybody help me replicate the issue?

@eladkal
Copy link
Contributor

eladkal commented Jan 11, 2023

It could be that this was already fixed in main?
@RNHTTR can you please check?

@eladkal eladkal added Can't Reproduce The problem cannot be reproduced pending-response labels Jan 11, 2023
@RNHTTR
Copy link
Collaborator Author

RNHTTR commented Jan 11, 2023

I can still reproduce on 2.5.0:

If you have a paused DAG, and the DAG hasn't run for the active data interval, manually triggering the DAG using the UI will simultaneously trigger two DAG runs:

  1. The manual run with a run_id similar to manual__2023-01-11T15:27:32.643794+00:00
  2. The scheduled run, because the DAG needs to become unpaused to schedule tasks for the manual run

If you can't reproduce it, I recommend confirming that there are no dagruns within the current data interval.

I think the easiest change that would be generally good would be:

If a DAG is paused and a dagrun will be scheduled if the DAG is unpaused, and the DAG is attempted to be triggered by clicking the play button from the UI, the UI should show a warning indicating that a scheduled run will also be triggered.

@eladkal eladkal removed Can't Reproduce The problem cannot be reproduced pending-response labels Jan 16, 2023
@bispaul
Copy link

bispaul commented Mar 12, 2023

Hi @RNHTTR wanted to know isn't there a way to run the job manually without the job getting unpaused?
I tried the cli airflow dags trigger example_branch_dop_operator_v3. However the job stays in queued state.
Is showing the warning the best way to go?

@ajithshetty
Copy link

ajithshetty commented Jun 8, 2023

I could this happening in the MWAA 2.5.1.

  1. Set the schedule to @once
  2. Make sure the DAG is Paused
  3. Deleted all the previous Runs(if any)
  4. Trigger the DAG from UI

But in case of the Schedule "None", by triggering the job from UI will only trigger 1 dag run.
As its already discussed above,
When you run the PAUSED DAG, it will:

  1. Turn on the DAG
  2. default schedule will kick in and execute accordingly
  3. Execute another run as per User's request(from the UI)

Would like to know if there any update on this please?

@RNHTTR
Copy link
Collaborator Author

RNHTTR commented Jun 8, 2023

@bispaul sorry for the delayed response. No, a DAG must be unpaused in order to be run. You could prevent non-manual runs from being triggered by removing the schedule interval, though.

@vaibhavnsingh
Copy link

vaibhavnsingh commented Nov 29, 2023

Airflow Version 2.5.0

Schedule Created time - 5:52 PM (frequency – 4 mins, starting 6:00 PM)

1st trigger - 6:04 PM
2nd trigger - 6:08 PM

Disabled - 6:09 PM
Enabled - 6:22 PM

3rd trigger - 6:22 PM (had to be skipped) – which was supposed to be at 6:16 PM
4th trigger - 6:22 PM (had to be skipped) - – which was supposed to be at 6:20 PM

So when enabled at 6:22 it executed 2 runs, I have observed that sometime airflow is executing last 2 instances between the interval and finally it executes a schedule when interval expires at 6:24pm. Anyone with same observation?

Also I have observed on another setup it executes only last interval. so in above case it would be 6:20pm.

@RNHTTR
Copy link
Collaborator Author

RNHTTR commented Nov 29, 2023

I no longer think this is actually a bug for DAG runs that are within a DAG's schedule interval. It makes sense that when a DAG is unpaused, the run for the current DAG interval is triggered. Triggering a DAG run via the "play" button unpauses the corresponding DAG, so this behavior is normal.

What could be improved is that this flow is not obvious. I think it'd be nice if there was some indication that the DAG will be unpaused and if one or more (if catchup=True) scheduled DAG runs will be triggered upon the action, that should be made clear to the user. Either with a modal or maybe a toast or something.

@bispaul are you still working on this?

@vaibhavnsingh
Copy link

I do agree to your comment @RNHTTR . So once i resume the immediate interval which has expired would execute, that is what I have the understanding but in my case it is executing n and n-1 interval. That is kind of weird as I do not see that same behaviour on my different setup. In the other setup it executes only n.

@bispaul
Copy link

bispaul commented Nov 30, 2023

Hi @RNHTTR , I can start working on it now that it is clear a message needs to be displayed.

@vaibhavnsingh
Copy link

probably I was not able to put my problem statement clearly. So basically When a DAG is paused the next run keeps on refreshing based on the interval. So when the DAG is enabled, it executes the last run which had expired. But in my case the next run doesn't refresh. It is the time when it was paused. So when enabled it executes this old run instance and new.

As you see below current time is 04-DEC-2023 10:30 but the next run still showing the time as

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affected_version:2.3 Issues Reported for 2.3 area:core good first issue kind:bug This is a clearly a bug
Projects
None yet
Development

No branches or pull requests

7 participants