-
Notifications
You must be signed in to change notification settings - Fork 14.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarify how schedule_interval works #221
Comments
Thanks for all the details, it should make it much easier to zero in on what's going on (actually, what is NOT going on...). Oh and your expectations are right, though typically Airflow isn't used to do every minute-type batch jobs, but I understand you did this for testing purposes. The So it looks like your scheduler is aware of your DAG, and the worker too. First thing I would do would be to look into the log file on the worker (if you cannot see). Was that job triggered by the scheduler or the UI "run" button? Is the scheduler emitting messages? Are the workers picking them up? If so it may be that the worker and the scheduler are pointing to different metadata databases, make sure the configurations are in sync. |
Thanks for the quick response @mistercrunch. The job was definitely triggered by the UI run button. The scheduler isn't emitting any messages other than the one below, every 5 seconds.
You are right - it looks to me like the scheduler and the workers are talking passed each other. But at the same time I am not seeing any action from the scheduler :\ |
The job view can be misleading, it look fine to me. You can disregard it for now. The scheduler should be emitting logs saying "First run for ..." or "Queuing next run:" From the context where you run the scheduler, can you issue the command:
I'd also wipe out the |
This is the outcome of the command.
Full output:
Still no luck, but thanks again for your help :) |
Here is a gist of the config I am using too https://gist.github.com/maraca/6fb10e135cadbb0b65bd |
A bit of a wild guess here, but can you try changing your config to non-relative folders? |
Also, can you throw a file
I think you'll get a bit more logging events on the scheduler which may help |
Also please share the output of |
We'll get to the bottom of this! |
@mistercrunch sorry for the delayed responsed I was traveling last night. The following is not going to be very useful as I can not for the life of me figure out why it started working. Here is what I did.
I attempted reverting all those steps to see where it breaks and although I am back to where I was yesterday config wise, the scheduler is still scheduling tasks, and the worker is still picking them up.
This is great news because it now works, however I am not able to identified what I did to possibly get this to run now :( |
I think I foud the culprit ... - 'start_date': datetime(2015, 8, 6, 8, 4),
+ 'start_date': datetime(2015, 8, 5, 8, 4), If I put the start date to Today, it doesn't schedule the job. If I put it at yesterday's date it works. >>> from datetime import datetime
>>> start_date = datetime(2015, 8, 6, 8, 4) # 2015-08-06 08:04am
>>> (datetime.now() - start_date).total_seconds()
2003.481128 What's strange is that the In any case, having a |
Oh. I should have caught this earlier but the issue is your DAG is actually a daily dag since at the moment
I need to clarify that in the docs / API. |
thanks for the clarification @mistercrunch! had same issues and moving the schedule_interval from default_args into DAG() solved it nicely |
Updated the tutorials / docs here: #238 |
Hi,
This is a dummy example that consists of 4 tasks, back to back, all attached to the same DAG
events_redshift
.I've set
schedule_interval
to1
for now, as I am trying to see this executed, but that's not a real life example.This is running the
CeleryExecutor
andPostgresql
.I can see the DAG on the web UI, however the only way to get it to execute the tasks is by clicking on it and
Run
manually, as you can see withdownload_from_s3
.This is the celery worker:
And the scheduler's output, refreshing every 5 seconds.
My expectations are that this should be running every minute, and each task should be executed back to back, however none of this is happening.
So I guess my question is: do I have the wrong expectation, and what am I doing wrong?
Thanks a lot for your help!
The text was updated successfully, but these errors were encountered: