New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reimplementation of RFC5545 Schedule #1071

Merged
merged 6 commits into from Jun 16, 2016

Conversation

Projects
None yet
3 participants
@ssalinas
Member

ssalinas commented Jun 8, 2016

Realized the branch that @grepsr started had a lot of odd merge artifacts present and was difficult to get into our staging environment. Reimplemented the functionality here based on that work, with a few tweaks + added basic tests. Usage via the api remains the same

caveats/changes from #931:

  • DTSTART functionality as it was implemented in #931 cannot handle a provided timezone correctly, and did not parse out the RRULE string separately. I've updated so a simple DTSTART=xxxxxxx will function, however specification of time zone will still throw an error at the moment
  • removed the limiter in year 2100 when the schedule is infinite since it's a bit arbitrary and instead implemented a max iterations (10k) for the finding of next valid time. Unless we are using a very old DTSTART and a very frequent schedule, we should not be hitting this anyways

Thanks @grepsr for starting this. I think it will be easier for us to get into our staging environments and into a release for you from this branch.

/cc @tpetr

@ghost

This comment has been minimized.

Show comment
Hide comment
@ghost

ghost Jun 8, 2016

@ssalinas this is great!

Thanks for the re-implementation. There were so many changes since I last added this, it was very hard to merge. I'll test this out and look forward to the release:)

ghost commented Jun 8, 2016

@ssalinas this is great!

Thanks for the re-implementation. There were so many changes since I last added this, it was very hard to merge. I'll test this out and look forward to the release:)

Show outdated Hide outdated ...ice/src/main/java/com/hubspot/singularity/data/SingularityValidator.java
valid = false;
}
checkBadRequest(valid, message);
}

This comment has been minimized.

@tpetr

tpetr Jun 8, 2016

Member

you could simplify this by just calling badRequest() in the catch block

@tpetr

tpetr Jun 8, 2016

Member

you could simplify this by just calling badRequest() in the catch block

This comment has been minimized.

@ssalinas

ssalinas Jun 8, 2016

Member

fixed

@ssalinas

ssalinas Jun 8, 2016

Member

fixed

@ssalinas

This comment has been minimized.

Show comment
Hide comment
@ssalinas

ssalinas Jun 8, 2016

Member

@grepsr this seemed to work ok in our test infra, but you probably have a better grasp of RFC and your usage of it than I do. Can you please test out this branch if you get a chance?

Member

ssalinas commented Jun 8, 2016

@grepsr this seemed to work ok in our test infra, but you probably have a better grasp of RFC and your usage of it than I do. Can you please test out this branch if you get a chance?

@ghost

This comment has been minimized.

Show comment
Hide comment
@ghost

ghost Jun 9, 2016

@ssalinas certainly, I will try this branch out in local and provide feedback!

ghost commented Jun 9, 2016

@ssalinas certainly, I will try this branch out in local and provide feedback!

@ghost

This comment has been minimized.

Show comment
Hide comment
@ghost

ghost Jun 9, 2016

I re-compiled and used it as a drop in replacement and it works great!

My only concern is regarding MAX_ITERATIONS = 10000

Lets say I schedule something to run every 1 hour starting June 9, 2016 (DTSTART).
10k hours = 1.14 years

It's been a while since I did this, but I think the internal loop will always start from the DTSTART, which is June 9, 2016

I think after 1.14 years of scheduling, which is not a lot - the schedule will stop?

ghost commented Jun 9, 2016

I re-compiled and used it as a drop in replacement and it works great!

My only concern is regarding MAX_ITERATIONS = 10000

Lets say I schedule something to run every 1 hour starting June 9, 2016 (DTSTART).
10k hours = 1.14 years

It's been a while since I did this, but I think the internal loop will always start from the DTSTART, which is June 9, 2016

I think after 1.14 years of scheduling, which is not a lot - the schedule will stop?

@ghost

This comment has been minimized.

Show comment
Hide comment
@ghost

ghost Jun 9, 2016

Perhaps we should go for a bigger number, like 1 million.

ghost commented Jun 9, 2016

Perhaps we should go for a bigger number, like 1 million.

@ssalinas

This comment has been minimized.

Show comment
Hide comment
@ssalinas

ssalinas Jun 9, 2016

Member

@grepsr @tpetr made a quick update in 2a69772 that does the following:

  • If there is not a COUNT or REPEAT part in the rule, take the max of DTSTART or current time as our DTSTART. In this way, we limit the number of cycles needed to find our next recurrence by starting further along. For an infinite loop, it essentially doesn't matter where DTSTART is, so we should then always be at a DTSTART where the first/next recurrence we come accross is the one we want to schedule.
  • If there is a COUNT specified, use the max of that or MAX_ITERATIONS as the loop limiter
Member

ssalinas commented Jun 9, 2016

@grepsr @tpetr made a quick update in 2a69772 that does the following:

  • If there is not a COUNT or REPEAT part in the rule, take the max of DTSTART or current time as our DTSTART. In this way, we limit the number of cycles needed to find our next recurrence by starting further along. For an infinite loop, it essentially doesn't matter where DTSTART is, so we should then always be at a DTSTART where the first/next recurrence we come accross is the one we want to schedule.
  • If there is a COUNT specified, use the max of that or MAX_ITERATIONS as the loop limiter

@ssalinas ssalinas added the hs_stable label Jun 10, 2016

@ghost

This comment has been minimized.

Show comment
Hide comment
@ghost

ghost Jun 13, 2016

Seems like a fair solution, not sure why the build failed though.

The only issue I see for this (but I might be wrong, depending on how Singularity schedules internally) - if for some reason the schedule gets delayed by say even a few minutes, the resulting "next" schedule will add on top of the delayed DTSTART, which may cause the future schedule to be delayed forever?

Thanks

ghost commented Jun 13, 2016

Seems like a fair solution, not sure why the build failed though.

The only issue I see for this (but I might be wrong, depending on how Singularity schedules internally) - if for some reason the schedule gets delayed by say even a few minutes, the resulting "next" schedule will add on top of the delayed DTSTART, which may cause the future schedule to be delayed forever?

Thanks

@ssalinas

This comment has been minimized.

Show comment
Hide comment
@ssalinas

ssalinas Jun 14, 2016

Member

Due to the fact that we are comparing the 'next' time we calculate to the current 'now' time, the issue you mention is still the case for the branch you started on as well. That might be something we live with for the moment, but as a future improvement, the scheduler could keep track of the last time its poller ran, and use that as the time to compare to instead of the current time

Member

ssalinas commented Jun 14, 2016

Due to the fact that we are comparing the 'next' time we calculate to the current 'now' time, the issue you mention is still the case for the branch you started on as well. That might be something we live with for the moment, but as a future improvement, the scheduler could keep track of the last time its poller ran, and use that as the time to compare to instead of the current time

@ghost

This comment has been minimized.

Show comment
Hide comment
@ghost

ghost Jun 16, 2016

Sounds good - I'm going to start trying this in production around next week (about 1000 long running scheduled docker tasks)

Thanks

ghost commented Jun 16, 2016

Sounds good - I'm going to start trying this in production around next week (about 1000 long running scheduled docker tasks)

Thanks

@ssalinas ssalinas added this to the 0.8.0 milestone Jun 16, 2016

@ssalinas ssalinas merged commit 7af386d into master Jun 16, 2016

0 of 2 checks passed

continuous-integration/travis-ci/pr The Travis CI build could not complete due to an error
Details
continuous-integration/travis-ci/push The Travis CI build could not complete due to an error
Details

@ssalinas ssalinas deleted the rfc_sched branch Jun 16, 2016

@ghost

This comment has been minimized.

Show comment
Hide comment
@ghost

ghost Jul 26, 2016

Found a big issue in this re-implementation today.
Starting line 32:
if (now.getMillis() > start.getMillis()) { start = now; }

I think this condition should not be there. I have not fully tested yet, but this likely caused a schedule bug in my system today (I upgraded 2 days ago to 0.8.0)

Lets say a scheduled task runs every 3 hours.

It ran on 8 AM, and takes 30 minutes to finish. Technically, the next task should run at 11 AM, but I think due to this, it runs at around 11:30 AM (it uses the finish time as the next start time) This way, it will keep pushing the task 30 minutes late.

@ssalinas Your thoughts?

ghost commented Jul 26, 2016

Found a big issue in this re-implementation today.
Starting line 32:
if (now.getMillis() > start.getMillis()) { start = now; }

I think this condition should not be there. I have not fully tested yet, but this likely caused a schedule bug in my system today (I upgraded 2 days ago to 0.8.0)

Lets say a scheduled task runs every 3 hours.

It ran on 8 AM, and takes 30 minutes to finish. Technically, the next task should run at 11 AM, but I think due to this, it runs at around 11:30 AM (it uses the finish time as the next start time) This way, it will keep pushing the task 30 minutes late.

@ssalinas Your thoughts?

@ssalinas

This comment has been minimized.

Show comment
Hide comment
@ssalinas

ssalinas Jul 26, 2016

Member

@grepsr , I had originally put that in to save on the number of iterations needed in order to find the next valid recurrence. Can you post the schedule you were using when you saw this pop up?

Member

ssalinas commented Jul 26, 2016

@grepsr , I had originally put that in to save on the number of iterations needed in order to find the next valid recurrence. Can you post the schedule you were using when you saw this pop up?

@ghost

This comment has been minimized.

Show comment
Hide comment
@ghost

ghost Jul 27, 2016

@ssalinas It happens on all schedules. I have about 300 scheduled tasks running, and all of them got pushed and the next schedule ran with the start date/time on when it last finished. Then when it ran again, it got further pushed. Only the first run was on time.

I think there is no way around it. To get the right nextSchedule, we need to have it run all the loop.

ghost commented Jul 27, 2016

@ssalinas It happens on all schedules. I have about 300 scheduled tasks running, and all of them got pushed and the next schedule ran with the start date/time on when it last finished. Then when it ran again, it got further pushed. Only the first run was on time.

I think there is no way around it. To get the right nextSchedule, we need to have it run all the loop.

@ghost

This comment has been minimized.

Show comment
Hide comment
@ghost

ghost Jul 27, 2016

An example of the recurrence rule where this schedules next update wrongly is as follows. However, the rule does not really matter, every task is pushed further back depending on how long it takes.
FREQ=DAILY;DTSTART=20160402T071500;INTERVAL=1;UNTIL=20250226T000000

The problem is, when the task finishes, it calculates the next update at that very instance; and then it goes into the condition where start becomes now, and messes up the next update time.

ghost commented Jul 27, 2016

An example of the recurrence rule where this schedules next update wrongly is as follows. However, the rule does not really matter, every task is pushed further back depending on how long it takes.
FREQ=DAILY;DTSTART=20160402T071500;INTERVAL=1;UNTIL=20250226T000000

The problem is, when the task finishes, it calculates the next update at that very instance; and then it goes into the condition where start becomes now, and messes up the next update time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment