Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a cron service and command with cron expressions #1098

Merged
merged 39 commits into from Jan 15, 2020
Merged

Conversation

@fritzmg
Copy link
Collaborator

fritzmg commented Dec 16, 2019

This is a revamp of my original PR #708. The basic features and service tag options are the same - with the addition of being able to use full CRON expressions, instead of just minutely, hourly, etc.!

Command

Cron jobs can now be executed via the following command:

vendor/bin/contao-console contao:cron

This can be used in a minutely crontab for example. When using the command, only cron jobs with no scope definition (null) or the cli scope will be executed, jobs with the scope web will be skipped.

Service Tag

The new service tag contao.cronjob has the following options:

Option Description
interval Can be minutely, hourly, daily, weekly, monthly, yearly or a full CRON expression, like */5 * * * *.
method Will default to __invoke or onMinutely etc. when a named interval is used. Otherwise a method name has to be defined.

Example:

services:
  App\Cron\ExampleCron:
    tags:
      -
        name: contao.cronjob
        interval: * 0 * * *
        method: onDaily

Annotation

Tagging services is also possible via Annotations:

// src/Cron/ExampleCronJob.php
namespace App\Cron;

use Contao\CoreBundle\ServiceAnnotation\CronJob;
use Terminal42\ServiceAnnotationBundle\ServiceAnnotationInterface;

/**
 * @CronJob("* 0 * * *")
 */
class ExampleCronJob implements ServiceAnnotationInterface
{
    public function __invoke(): void
    {
        //
    }
}

When the annotation is used on the class, either the __invoke method will be used - or an auto generated method name (e.g. onMinutely), if present. Basically the same concept as in #1063.

Note: If you need an interval like */5 * * * you need to escape either the * or / with \, since */ would close the PHP comment. This is what line 56 in the Cron annotation class is for. @aschempp may be this is something, that the ServiceAnnotationBundle should handle by itself?

Front End Cron & Route

The /_contao/cron route still exists but now uses the new Cron service instead of the Contao\FrontendCron controller class. Same with the Contao\CoreBundle\EventListener\CommandSchedulerListener (poor man's cron). Cron jobs tagged with the scope cli will not be executed for either.

Additional Dependency

I am using dragonmantank/cron-expression to parse CRON expressions and check if a cron job is, or was, due. I did not use cron/cron because it lacks the latter functionality. The Cron within Contao might or will be executed in irregular intervals, thus it needs to be checked whether a cron job with a defined interval would currently be overdue relative to its last execution.

Example: say you have cron job with an interval of */5 * * * * and you are still using the front end command scheduler. This cron job would be due every 5 minutes of every hour - e.g. 00:00, 00:05, 00:10 etc. However, there might not be a request which would execute the Cron service every minute. If the Cron service is executed at 00:03:58 for example and the cron job has not been executed before, the cron job will be executed right away. Then if the next run of the Cron service is at 00:07:10 the cron job will be executed again, since its next run would have been at 00:05:00, relative to its last execution time. If another request is made at 00:09:46, the cron job will not be executed, since its next scheduled execution time would be at 00:10:00 - etc.

As for the package itself: it looks well maintained, has lots of installs and dependents - and apparently Laravel uses it too.

New Entity

The PR introduces a new Cron entity, which simply saves the "name" of a cron job (consisting of the class name and method) and its last execution.

tl_cron

Backwards Compatibility

I have completely deleted the tl_cron DCA and am now using the tl_cron table for the new format (via an entity). This is probably not ideal for BC? ;)

If we want to keep the old tl_cron table as is, we could use a new tl_cron_runs table instead. In that case: should we still update the values in the tl_cron table? May be simply through the legacy cron.

Scope

In some cases a cron job might want to know in which "scope" it is executed in - i.e. as part of a front end request as part of the cron command on the command line interface. The Cron service will pass a scope parameter to the cron job's method.

namespace App\Cron;

use Contao\CoreBundle\Cron\Cron;

class HourlyCron
{
    public function __invoke(string $scope): void
    {
        // Do not execute this cron job in the web scope
        if (Cron::SCOPE_WEB === $scope) {
            return;
        }

        //
    }
}
@fritzmg fritzmg added the feature label Dec 16, 2019
@fritzmg fritzmg requested review from aschempp and leofeyer Dec 16, 2019
@fritzmg fritzmg self-assigned this Dec 16, 2019
Copy link
Member

Toflar left a comment

Looks really good implementation-wise! I like the concept. I added a few remarks and here are some more general ones:

  • I don't thing the ServiceAnnotationBundle should handle comment escaping. If any, then the annotation extrator itself. But I don't think that's really possible. I think escaping is just fine and it's a matter of adding it to the documentation.

  • I'm fine with the dependency. I checked it myself and there's nothing to worry about.

  • I'm also fine dropping the existing tl_cron table and replacing it with an entity. I don't see any reason why anyone would've ever extended that table or relied on it. And if so, that sounds like an architectural mistake to me that can be easily fixed by providing own ways to store meta data.

  • I think this PR should wait for @ausi's migration framework PR to be merged into master so that the migration is already correct here.

So to sum up: Excellent work @fritzmg - I really like this one as well!

core-bundle/src/Entity/Cron.php Outdated Show resolved Hide resolved
core-bundle/src/Repository/CronRepository.php Outdated Show resolved Hide resolved
$this->connection->exec('UNLOCK TABLES');
}
public function persist(CronEntity ...$entities): void

This comment has been minimized.

Copy link
@Toflar

Toflar Dec 16, 2019

Member

I don't like this method name. Imho it should be persistAndFlush() or something similar. Everybody working with Doctrine knows exactly that persist() does not persist to the database automatically but your method does. That's irritating to me :)

This comment has been minimized.

Copy link
@fritzmg

fritzmg Dec 16, 2019

Author Collaborator

Well, it is called persist in the already existing RememberMeRepository, so ... ;)

This comment has been minimized.

Copy link
@Toflar

Toflar Dec 16, 2019

Member

That's not good either then.

This comment has been minimized.

Copy link
@aschempp

aschempp Dec 17, 2019

Contributor

Actually I don't think the repository should persist or flush at all 🙈

@leofeyer leofeyer added this to the 4.9 milestone Dec 16, 2019
@fritzmg

This comment has been minimized.

Copy link
Collaborator Author

fritzmg commented Dec 16, 2019

  • I think this PR should wait for @ausi's migration framework PR to be merged into master so that the migration is already correct here.

Agreed, I was planning on doing that anyway, forgot to mention it. I added the migration step as a "reminder" of sorts.

@fritzmg

This comment has been minimized.

Copy link
Collaborator Author

fritzmg commented Dec 16, 2019

The code analyzer reports the following:

 ------ -------------------------------------------------------------------- 
  Line   tests/DependencyInjection/Compiler/AddCronJobsPassTest.php          
 ------ -------------------------------------------------------------------- 
  31     Call to an undefined method                                         
         Symfony\Component\DependencyInjection\ContainerBuilder::method().   
  37     Call to an undefined method                                         
         Symfony\Component\DependencyInjection\ContainerBuilder::expects().  
  242    PHPDoc tag @return contains unresolvable type.                      
 ------ -------------------------------------------------------------------- 

Not sure what do do here - am I using outdated phpunit methods?

Copy link
Contributor

aschempp left a comment

Looking very good indeed! 🎉

Some thoughts:

  • I think the @Cron annotation should be named @CronJob to be consistent.
  • I would also name the table tl_cron_job. There's no value in keeping the same table name, but it can be confusing. To be honest, I did sometimes use that table in my extensions, and using a new name would at least make it obvious that this really has changed.
  • I would suggest to not have onMinutely etc. auto-detection for method names. The implementation in the mentioned PR was only for backwards compatibility. A class annotation should always be for the __invoke method if no method name is given.
  • One conflict I can foresee: what if I want to run the same class & method multiple times, by defining multiple cron job times? This would lead to conflicts about lastRun in the database table, wouldn't it?
core-bundle/src/Cron/Cron.php Outdated Show resolved Hide resolved
core-bundle/src/Cron/Cron.php Outdated Show resolved Hide resolved
/**
* Add a cron service.
*
* @param string $interval The interval as a CRON expression

This comment has been minimized.

Copy link
@aschempp

aschempp Dec 17, 2019

Contributor

why is this @param needed?

This comment has been minimized.

Copy link
@fritzmg

fritzmg Dec 17, 2019

Author Collaborator

To provide additional info about the parameter (i.e. that it is supposed to be a "CRON expression").

*
* @param string $interval The interval as a CRON expression
*/
public function addCronJob($service, string $method, string $interval, int $priority = 0, string $scope = null): void

This comment has been minimized.

Copy link
@aschempp

aschempp Dec 17, 2019

Contributor

is $service a callable or object (as of PHP 7.2 if I'm right)? Maybe we pass use a callable instead (which in most cases will be just the service object for __invoke, or an array of object and method)?

This comment has been minimized.

Copy link
@fritzmg

fritzmg Dec 17, 2019

Author Collaborator

Hmm... well I need the method name for the name in the table anyway. Or should we ommit ::method for the name, if the object is a callable (and thus only save the class name as the name of the cron job)?

@fritzmg

This comment has been minimized.

Copy link
Collaborator Author

fritzmg commented Dec 17, 2019

  • One conflict I can foresee: what if I want to run the same class & method multiple times, by defining multiple cron job times? This would lead to conflicts about lastRun in the database table, wouldn't it?

Very good point, I haven't thought of that, since I always only considered the annotation (where this wouldn't be possible?). But it is perfectly possible to use multiple tags for the same service (and thus the same method) of course.

To fix this, I would also save the defined interval in the tl_cron/tl_cron_job table (and create a unique index over name,interval).

Copy link
Contributor

aschempp left a comment

More thoughts: I think the current Cron implementation could be optimized:

  1. Why is the priority not applied in the compiler pass and therefore automatically correct in the addCronJob method? Do we really need another priority inside the cron handler?
  2. Maybe we should use a data object for the cron definition instead of an array?
  3. if one of the cronjob currently fails, all jobs will be marked as executed because that's done before running them. This certainly will be a problem and should be re-thought.

I can't remember why we had the cron scopes at all? I mean how can I ever want a cron job not to run when I run my cronjobs? It would basically mean I will always need to setup the CLI and web callbacks to execute everything? I know there might be jobs that can't run on the CLI for legacy reasons, but shouldn't that be deprecated/disallowed in the new setup?

@fritzmg

This comment has been minimized.

Copy link
Collaborator Author

fritzmg commented Dec 17, 2019

  1. Why is the priority not applied in the compiler pass and therefore automatically correct in the addCronJob method? Do we really need another priority inside the cron handler?

True, this could already be sorted in the in the compiler pass.

  1. Maybe we should use a data object for the cron definition instead of an array?

I am all for that. Do we use anything like that in the core yet, so that I have a reference?

  1. if one of the cronjob currently fails, all jobs will be marked as executed because that's done before running them. This certainly will be a problem and should be re-thought.

I would simply use a try/catch block within the cron job execution loop, creating a log entry when a job fails. This way the other jobs can still be executed and all of them being marked as executed will be correct then.

There might be edge cases where the whole cron service fails and thus jobs are being marked as run even though they weren't. I'd rather not introduce an additional database flag or anything like that showing the current state of the job, due to locking and concurrency.

@aschempp

This comment has been minimized.

Copy link
Contributor

aschempp commented Dec 17, 2019

  1. Maybe we should use a data object for the cron definition instead of an array?

I am all for that. Do we use anything like that in the core yet, so that I have a reference?

The migrations command PR does: https://github.com/contao/contao/pull/709/files#diff-68d11ab2b9b2006b39d4cb82a8771588

I would simply use a try/catch block within the cron job execution loop, creating a log entry when a job fails. This way the other jobs can still be executed and all of them being marked as executed will be correct then.

try-catch and logging would absolutely make sense to me. Be aware that we might want to log to tl_log (ContaoContext).

There might be edge cases where the whole cron service fails and thus jobs are being marked as run even though they weren't. I'd rather not introduce an additional database flag or anything like that showing the current state of the job, due to locking and concurrency.

How about we only update the cronjob table after the job has been executed? Or you mean you don't want to lock the tables for that long period?

To fix this, I would also save the defined interval in the tl_cron/tl_cron_job table (and create a unique index over name,interval).

Is there any chance we can use this to actually manipulate the cron job run times as an application maintainer (which is a feature I'd like to see, but has low prio)? 🙃

@fritzmg

This comment has been minimized.

Copy link
Collaborator Author

fritzmg commented Dec 17, 2019

How about we only update the cronjob table after the job has been executed? Or you mean you don't want to lock the tables for that long period?

Yeah, I did not want to lock the tables for the complete duration of all cron jobs. Suppose you set contao:cron to execute minutely in your crontab... what happens if one of those cron jobs runs for an hour - which would mean the table is locked for an hour?

@fritzmg

This comment has been minimized.

Copy link
Collaborator Author

fritzmg commented Dec 17, 2019

Is there any chance we can use this to actually manipulate the cron job run times as an application maintainer (which is a feature I'd like to see, but has low prio)? 🙃

Hm, couldn't you just implement your own compiler pass (which is run before AddCronJobsPass) and change the service tag accordingly?

@aschempp

This comment has been minimized.

Copy link
Contributor

aschempp commented Dec 18, 2019

Yeah, I did not want to lock the tables for the complete duration of all cron jobs. Suppose you set contao:cron to execute minutely in your crontab... what happens if one of those cron jobs runs for an hour - which would mean the table is locked for an hour?

Hmm, not sure if that solves it actually. The minutely cronjob would be started a minute later, even though it is still running for an hour?

Is there any chance we can use this to actually manipulate the cron job run times as an application maintainer (which is a feature I'd like to see, but has low prio)? 🙃

Hm, couldn't you just implement your own compiler pass (which is run before AddCronJobsPass) and change the service tag accordingly?

Very true, but complicated 😂
I liked the idea of cron/cron where a app user can decide what job to run when. In the Managed Edition®, it mostly makes sense to have it defined by the dev, but maybe I wanna change it. But nevermind, thats not really in this PRs scope.

@fritzmg

This comment has been minimized.

Copy link
Collaborator Author

fritzmg commented Dec 18, 2019

Hmm, not sure if that solves it actually. The minutely cronjob would be started a minute later, even though it is still running for an hour?

Example: you have set up to run the Contao cron every minute. There is a job that runs for 60 minutes. Every minute, this job is executed and then runs in parallel with all the other executions, if the database table is not locked during the whole execution of all crons. This would be the current behaviour and also the old behaviour.

However, if you do lock the table for the whole execution, then the Contao cron process is still spawned every minute. However, the second run has to wait, until the first one is finished, then a minute later, a new process is spawned that also has to wait until the table is not locked anymore, etc. This would spawn infinite processes. Unless the process fails, when a database lock cannot be achieved? I don't actually know what happens then. I'd have to test.

Anyhow: we could also save the state of each cronjob in the database and thus automatically skip its execution, if it is still running. So tl_cron_job.state would either have nothing, running, finished or failed in it and a job (with a given interval) can only be executed again, if its state is either empty or not running.

Very true, but complicated 😂
I liked the idea of cron/cron where a app user can decide what job to run when. In the Managed Edition®, it mostly makes sense to have it defined by the dev, but maybe I wanna change it. But nevermind, thats not really in this PRs scope.

Hm, what about something like this:

class CronInterface
{
    /**
     * Returns all added cron jobs.
     */
    public function getCronJobs(): array;

    /**
     * Returns a specific cron job, given its name and interval.
     * If the interval is ommitted, the first cron job found with the given name will be returned.
     */
    public function getCronJob($name, $interval = null): ?CronJobInterface
}
class CronJobInterface
{
    /**
     * Returns a callable or array.
     */
    public function getService()

    public function getName(): string
    public function getInterval(): string
 
    /**
     * Adjust the interval of the cron job.
     */
    public function setInterval(string $interval): self
}

Haven't really looked at how cron/cron handles this though :)

@aschempp

This comment has been minimized.

Copy link
Contributor

aschempp commented Dec 20, 2019

I just discussed my thoughts and issues with @fritzmg, here are some details:

  1. We discussed the issue of jobs running concurrently, meaning if a minutely job runs longer than one minute, it could be running twice. However, we both agreed this is not really an issue, or not one that we (Contao) should fix. The exact same would apply to a Crontab, it would simply run the process twice and not care about already running jobs.

  2. I complained about the scope idea, because I think 99% of devs should not think about it. The (only) viable solution mentioned by @Toflar (listening in on my call) was a case of a job that knows it's going to run for longer than the web process can handle, e.g. to behave differently in that case. We agreed that it would be better to check PHP_SAPI or have another runtime information to know if the command or the web job is running. It's not something one should differentiate in the annotation.

  3. There are two concerns remaining: custom scheduling for jobs and execution time

    • Custom scheduling for jobs: it would be nice if it was possible to override the annotation when a job should be executed. E.g. if Contao does logrotation once a week, it might make sense to run that job once a day instead.

      Solving this is actually pretty simple: as this is an advanced feature, I'd assume one would have a Crontab available. So instead of running the contao:cron command, one should be able to run contao:cron --job=MySuperJob::method. This way I can execute each job at the time I want, and possibly even in parallel by scheduling all of them individually in Crontab.

    • Execution time: Currently, we're fetching all current jobs from the DB, mark them as executed now and then run them sequentially. Apart from issues if one job quits the app, this also means a second job might run 5 minutes later, if the first job takes 5 minutes to complete.

      Given the above contao:cron --job=… option, the most simple solution would be to actually start a background process for every cron job, when contao:cron is executed without arguments. This means one process can die without affecting the others, and they are actually run in parallel and asynchronous. That's obviously something that only works on the command line and not in the web process.

As discussed with @fritzmg, the two things in point 3 are not necessarily part of this PR and can be added/improved in a later implementation (possibly for Contao 4.10). We just need to make sure the annotations and interfaces are correct in the first place, meaning point 2 needs to be addressed before merging this.

@fritzmg

This comment has been minimized.

Copy link
Collaborator Author

fritzmg commented Jan 2, 2020

  • I would suggest to not have onMinutely etc. auto-detection for method names. The implementation in the mentioned PR was only for backwards compatibility. A class annotation should always be for the __invoke method if no method name is given.

But this isn't just about service annotations. If you define your service tag in your service configuration, it would still be useful to not need to define the method name as well? Or should this also be restricted to only the __invoke method?

@fritzmg fritzmg force-pushed the feature/cron-scheduler branch from c2385fc to 0d1527e Jan 2, 2020
@fritzmg

This comment has been minimized.

Copy link
Collaborator Author

fritzmg commented Jan 2, 2020

  • I have removed the scope parameter in e7fc60e for now.
  • Since we want to execute the jobs in parallel in the future, I have also removed the priority parameter, since that wouldn't make any sense then anymore.
  • tl_cron is now simply dropped by default (as the DCA is missing) and the entity uses tl_cron_job instead.
  • All the relevant classes are now called CronJob instead of Cron (ce61eb0).
  • DateTimeInterface instead of DateTime is now used (f939b08).
@leofeyer leofeyer changed the title [RFC] Add Contao Cron service & command with CRON expressions Add a cron service and command with cron expressions Jan 2, 2020
@leofeyer

This comment has been minimized.

Copy link
Member

leofeyer commented Jan 2, 2020

Contao 4.9 is now in the review phase in which we try to review and merge the remaining pull requests that have been created during the development phase. Your pull request does not yet meet the requirements to be reviewed though:

  • Make sure that the PR is feature complete.
  • Rebase the PR onto the latest master branch.
  • Address the requested changes if any.
  • Make sure that all CI checks are successful.

Please fix this soon if you can, because the review phase ends on January 15th and we need some time to review the PR before we can merge it.

@fritzmg

This comment has been minimized.

Copy link
Collaborator Author

fritzmg commented Jan 2, 2020

Regarding scope matching: initially I thought, the simplest solution for anyone who wants to implement a cron, that is only executed (or not executed) in the "web scope" is to do the following:

// Do not execute cron in web scope
if (null !== $this->requestStack->getCurrentRequest()) {
    return;
}

Unfortunately, that will not work, since the CommandSchedulerListener is executed in the kernel.terminate event and there, the current request will not be present anymore.

@fritzmg

This comment has been minimized.

Copy link
Collaborator Author

fritzmg commented Jan 5, 2020

I have added some changes and a new proposal for scoped cron jobs in 0a619d4:

  • A new CronJob class now holds the service object, method name, interval and cron job name.
  • The new class also handles execution of the cron job.
  • A new ScopedCronJobInterface can now be optionally implemented, using a setScope method.
  • The cron service will set the scope for all cron jobs and the CronJob class will pass this down to the service, if it is an instance of ScopedCronJobInterface.

@leofeyer @aschempp @Toflar wdyt?

I also renamed the annotation from Cron to CronJob and the service tag from contao.cron to contao.cronjob in e248bc7 to be consistent with the rest.

@fritzmg fritzmg force-pushed the feature/cron-scheduler branch 2 times, most recently from 2d09608 to 256aca8 Jan 5, 2020
@leofeyer

This comment has been minimized.

Copy link
Member

leofeyer commented Jan 6, 2020

Can you explain why we need a LegacyCron class which only supports a handful of hardcoded cron jobs? Can we not refactor theses methods?

@fritzmg

This comment has been minimized.

Copy link
Collaborator Author

fritzmg commented Jan 6, 2020

The LegacyCron class executes all cronjobs registered via $GLOBALS['TL_CRON']. It does not execute any hardcoded cron jobs. I think you confused the check for the core cronjobs: this is just there to issue a deprecation warning, whenever someone uses $GLOBALS['TL_CRON'] instead of a tagged service. The check ensures that this deprecation warning is not issued for the legacy cronjobs of the core.

@fritzmg fritzmg force-pushed the feature/cron-scheduler branch from 388b121 to c421efa Jan 15, 2020
@fritzmg

This comment has been minimized.

Copy link
Collaborator Author

fritzmg commented Jan 15, 2020

Further changes:

  • $scope parameter on Cron::run is now mandatory.
  • ScopeAwareCronJobInterface removed and scope is instead passed to the cron job's method.
  • Cron::addCronJob now expects a CronJob object.
  • CronJobRepository::persistAndFlush removed.
@fritzmg fritzmg requested review from leofeyer and aschempp Jan 15, 2020
Copy link
Member

Toflar left a comment

I like the concept now :)

return;
}
$serviceIds = $container->findTaggedServiceIds('contao.cronjob');

This comment has been minimized.

Copy link
@Toflar

Toflar Jan 15, 2020

Member

Priorities make no sense in cronjobs, native ones don't have one either.

leofeyer added a commit that referenced this pull request Jan 15, 2020
Description
-----------

While working on #1098, another discrepancy with the `RememberMeRepository` that I found was, that the `services.yml` defines two service constructor for the `RememberMeRepository` - even though it just takes one. The second parameter would only be necessary, if the `RememberMeRepository` did not define its own constructor. However it does define its own constructor, hardcoding the entity class within PHP.

Commits
-------

7ba6212 remove superfluous argument
5bd7353 Update ContaoCoreExtensionTest.php
de5f1f7 remove use statement
@fritzmg fritzmg dismissed stale reviews from leofeyer and Toflar via 80a2dab Jan 15, 2020
@leofeyer leofeyer merged commit ad125f7 into master Jan 15, 2020
9 checks passed
9 checks passed
Coverage
Details
Coding Style
Details
PHP 7.2
Details
PHP 7.3
Details
PHP 7.4
Details
Prefer Lowest
Details
Bundles
Details
Windows
Details
codecov/project 89.51% (-0.14%) compared to 275dd6a
Details
@leofeyer leofeyer deleted the feature/cron-scheduler branch Jan 15, 2020
@leofeyer

This comment has been minimized.

Copy link
Member

leofeyer commented Jan 15, 2020

Thanks a lot @fritzmg.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.