Skip to content

Reimplementation of RFC5545 Schedule#1071

Merged
ssalinas merged 6 commits intomasterfrom
rfc_sched
Jun 16, 2016
Merged

Reimplementation of RFC5545 Schedule#1071
ssalinas merged 6 commits intomasterfrom
rfc_sched

Conversation

@ssalinas
Copy link
Copy Markdown
Contributor

@ssalinas ssalinas commented Jun 8, 2016

Realized the branch that @grepsr started had a lot of odd merge artifacts present and was difficult to get into our staging environment. Reimplemented the functionality here based on that work, with a few tweaks + added basic tests. Usage via the api remains the same

caveats/changes from #931:

  • DTSTART functionality as it was implemented in RFC 5545 recur schedule support #931 cannot handle a provided timezone correctly, and did not parse out the RRULE string separately. I've updated so a simple DTSTART=xxxxxxx will function, however specification of time zone will still throw an error at the moment
  • removed the limiter in year 2100 when the schedule is infinite since it's a bit arbitrary and instead implemented a max iterations (10k) for the finding of next valid time. Unless we are using a very old DTSTART and a very frequent schedule, we should not be hitting this anyways

Thanks @grepsr for starting this. I think it will be easier for us to get into our staging environments and into a release for you from this branch.

/cc @tpetr

@grepsr
Copy link
Copy Markdown

grepsr commented Jun 8, 2016

@ssalinas this is great!

Thanks for the re-implementation. There were so many changes since I last added this, it was very hard to merge. I'll test this out and look forward to the release:)

valid = false;
}
checkBadRequest(valid, message);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could simplify this by just calling badRequest() in the catch block

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@ssalinas
Copy link
Copy Markdown
Contributor Author

ssalinas commented Jun 8, 2016

@grepsr this seemed to work ok in our test infra, but you probably have a better grasp of RFC and your usage of it than I do. Can you please test out this branch if you get a chance?

@grepsr
Copy link
Copy Markdown

grepsr commented Jun 9, 2016

@ssalinas certainly, I will try this branch out in local and provide feedback!

@grepsr
Copy link
Copy Markdown

grepsr commented Jun 9, 2016

I re-compiled and used it as a drop in replacement and it works great!

My only concern is regarding MAX_ITERATIONS = 10000

Lets say I schedule something to run every 1 hour starting June 9, 2016 (DTSTART).
10k hours = 1.14 years

It's been a while since I did this, but I think the internal loop will always start from the DTSTART, which is June 9, 2016

I think after 1.14 years of scheduling, which is not a lot - the schedule will stop?

@grepsr
Copy link
Copy Markdown

grepsr commented Jun 9, 2016

Perhaps we should go for a bigger number, like 1 million.

@ssalinas
Copy link
Copy Markdown
Contributor Author

ssalinas commented Jun 9, 2016

@grepsr @tpetr made a quick update in 2a69772 that does the following:

  • If there is not a COUNT or REPEAT part in the rule, take the max of DTSTART or current time as our DTSTART. In this way, we limit the number of cycles needed to find our next recurrence by starting further along. For an infinite loop, it essentially doesn't matter where DTSTART is, so we should then always be at a DTSTART where the first/next recurrence we come accross is the one we want to schedule.
  • If there is a COUNT specified, use the max of that or MAX_ITERATIONS as the loop limiter

@grepsr
Copy link
Copy Markdown

grepsr commented Jun 13, 2016

Seems like a fair solution, not sure why the build failed though.

The only issue I see for this (but I might be wrong, depending on how Singularity schedules internally) - if for some reason the schedule gets delayed by say even a few minutes, the resulting "next" schedule will add on top of the delayed DTSTART, which may cause the future schedule to be delayed forever?

Thanks

@ssalinas
Copy link
Copy Markdown
Contributor Author

Due to the fact that we are comparing the 'next' time we calculate to the current 'now' time, the issue you mention is still the case for the branch you started on as well. That might be something we live with for the moment, but as a future improvement, the scheduler could keep track of the last time its poller ran, and use that as the time to compare to instead of the current time

@grepsr
Copy link
Copy Markdown

grepsr commented Jun 16, 2016

Sounds good - I'm going to start trying this in production around next week (about 1000 long running scheduled docker tasks)

Thanks

@ssalinas ssalinas added this to the 0.8.0 milestone Jun 16, 2016
@ssalinas ssalinas merged commit 7af386d into master Jun 16, 2016
@ssalinas ssalinas deleted the rfc_sched branch June 16, 2016 15:52
@grepsr
Copy link
Copy Markdown

grepsr commented Jul 26, 2016

Found a big issue in this re-implementation today.
Starting line 32:
if (now.getMillis() > start.getMillis()) { start = now; }

I think this condition should not be there. I have not fully tested yet, but this likely caused a schedule bug in my system today (I upgraded 2 days ago to 0.8.0)

Lets say a scheduled task runs every 3 hours.

It ran on 8 AM, and takes 30 minutes to finish. Technically, the next task should run at 11 AM, but I think due to this, it runs at around 11:30 AM (it uses the finish time as the next start time) This way, it will keep pushing the task 30 minutes late.

@ssalinas Your thoughts?

@ssalinas
Copy link
Copy Markdown
Contributor Author

@grepsr , I had originally put that in to save on the number of iterations needed in order to find the next valid recurrence. Can you post the schedule you were using when you saw this pop up?

@grepsr
Copy link
Copy Markdown

grepsr commented Jul 27, 2016

@ssalinas It happens on all schedules. I have about 300 scheduled tasks running, and all of them got pushed and the next schedule ran with the start date/time on when it last finished. Then when it ran again, it got further pushed. Only the first run was on time.

I think there is no way around it. To get the right nextSchedule, we need to have it run all the loop.

@grepsr
Copy link
Copy Markdown

grepsr commented Jul 27, 2016

An example of the recurrence rule where this schedules next update wrongly is as follows. However, the rule does not really matter, every task is pushed further back depending on how long it takes.
FREQ=DAILY;DTSTART=20160402T071500;INTERVAL=1;UNTIL=20250226T000000

The problem is, when the task finishes, it calculates the next update at that very instance; and then it goes into the condition where start becomes now, and messes up the next update time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants