Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Able to set the maximum length of a job #268

Open
AndrewFarley opened this issue Mar 11, 2020 · 0 comments
Open

Able to set the maximum length of a job #268

AndrewFarley opened this issue Mar 11, 2020 · 0 comments

Comments

@AndrewFarley
Copy link

Feature Request

I'd like to be able to set the maximum length of a job, after which is killed. This kill would be considered an error condition and would trigger any onError logic.

Reasoning

This especially when combined with preventing overlapping jobs (#262) can prevent runaway processes and/or infinite loops that can be present not only in your code, but linked libraries and external dependencies. I find that inevitably in code deployed to a variety of systems, they hit odd edge cases, and exceptions that are not handled well. Such as the database temporarily "going away" or the networking or routing being jittery.

These edge cases cause havoc on basically every cron system out there and can cause concurrent processes sometimes hundreds to be running on a system unnecessarily. To prevent this, the only way to prevent this properly, is to have a feature toggle in your cron engine which prevents overlap (see: #262) AND allows you to set a maximum length of runtime.

One of these features alone doesn't properly suit the entire problem. For example:

  • if you just set a maximum length without the no overlap feature, then you could never set the max length longer than the interval of the runtime. And in some edge cases I've seen the desire for long-running tasks when they need to be long (eg: mass mailing systems) but run on a very short interval (1 minute).
  • Similarly, with only the preventOverlap feature, if some code caused an infinite loop (or infinite "stall" in some cases) then you could encounter the edge case where a task never finishes and is doing nothing, and it will never be executed ever again until someone manually killed that process. This is equally frustrating and useless.

Example

jobs:
  backupDatabaseHourly:
    cmd: /usr/local/bin/mysqldump -uuser -ppassword databasename > /mnt/backups/
    time: '0 0'
    maxRuntime: 5400 (this is in seconds, I chose 1.5 hours for this intentionally)
    preventOverlap: true  ( This would be from bug #262 )
    onError: Stop
    notifyOnSuccess:
      - type: program
        path: /usr/local/bin/slack-success.sh "Successfully backed up database"
    notifyOnFailure:
      - type: program
        path: /usr/local/bin/slack-failure.sh "Failed backing up database because: $1"

Spelled out in english, this job would run the backupDatabaseHourly job's cmd hourly on the 0th second and the 0th minute of every hour. It has a maximum runtime of 1.5 hours after which it will be killed. It has preventOverlap enabled which is critical for this task because if multiple of these tried to run concurrently they would potentially stack up and cause outages/downtime in the database. On success or failure it will notify the team on Slack via a helper script.

Unnecessary Background Info

I have a new client with complex cron requirements that is suffering with typical cron because of the failings of cron because of the above mentioned topic. So, I had a google around for potential alternatives, and stumbled upon Jobber, I like the core feature-set, the ability to list the jobs and their status and the Sinks. I'm going to play with it a bit shortly. To me, Jobber seems like a clear winner for this client's use-case, just needs a few more features.

Now onto the "why"... this mechanism I know works extremely well because about 7 years ago for a client with a very complex set of strict requirements (>200 cron jobs, no overlap ever, and wanted some jobs (mailers) to be able to run a very long time if they needed) I developed a Python-based cron-engine with support for preventing overlapping and maximum execution time to fit their need. That system is still in place to this day and I haven't touched it one bit, I still have the code laying around and am considering using it, but it lacks some features that jobber has (such as listing the jobs status and the sinks) which makes it quite appealing to me instead of adding those features to my Python code.

Personal Note: If I have the free time, I may even make the PR for this (and the preventOverlap) feature myself in the coming weeks/months.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants