Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow site admins to "speed up" dequeuing process to more than 1/5min #47

Open
jerclarke opened this issue Jan 17, 2018 · 0 comments
Open

Comments

@jerclarke
Copy link

Right now Amber has a very sane and safe approach to slowly working down the queue of posts -- one at a time, once every five minutes -- which is a good default, as many sites might struggle with a heavier workload, and in the long run, most sites will be "stable" with the default slow rate of checking URLs.

The problem is that for larger sites -- like ours, which has ~100k posts with a dozen or more links each -- this may never be a stable rate of URL checking. If the volume of posts per day is high, and the volume of links per post is high, then in the long run, there is a perpetually growing queue with no warning to site administrators. This ever-growing queue gets in the way of the long-term rechecking of all URLs to determine if they have since gone offline.

Admittedly, this isn't insurmountable with the current code, as the Amber Dashboard allows us to see the queue size and, if it's growing, click "Snapshot all new links" to hopefully-quickly clear out the queue in one sitting.

That said, it would be much better if the plugin allowed site administrators to control the rate of dequeing of URLs, since in many cases the rate can be increased significantly without any performance problems, and this can permanently remove the need for administrators to worry about queue length.

Current Behavior: Hardcoded queue management configuration

The Amber WordPress plugin hooks into the WordPress cron system with an "every five minutes" schedule and executes Amber::dequeue_link() once per 5 minutes with the Amber::cron_event_hook() method during the amber_cron_event_hook action.

Both of these factors, 5 minute cron schedule and one dequeue_link() per run are currently hardcoded in the plugin, making it impossible to directly alter them, even with the expertize and time for plugin development.

This rigidity is unnecessary, and IMHO both of these values can easily be made filterable in ways that would add only a few lines of PHP to your plugin, and subsequently would allow users to alter the plugin behavior with only a few lines of PHP on their end.

Ideal behavior: Filter for cron schedule and filter for number of URLs handled per cron run

So my proposal is you add a single location in the Amber plugin where the cron schedule is determined, and in that location you use WP's add_filter() function to allow the value to be modified by plugins.

Similarly, the Amber::cron_event_hook() method should be modified to have a "number or URLs to dequeue" variable, which is filtered with add_filter(), and which is used to run Amber::dequeue_link() that many times on each run.

Finally, the documentation should be updated to point out that these filters exist for advanced users, and simple code examples should be given of their usage.

This would allow major sites like ours to solve the problem for ourselves, without requiring any additional UI that might confuse users, bloat the interface or create additional dev burden.

Performance considerations

For the sake of completeness, I'll briefly outline the reasons someone would use one or the other of these two means of increasing the amount of URLs processed.

Increase cron frequency: By filtering the cron schedule from 5 minutes (300s) to 60s

  • As long as the site is loaded by anyone in any way (admin or frontend or RSS etc.) at least once per minute (triggering a cron run) the effect will be to process 5 times as many URLs in the same time period as the defaults.
  • Huge increase in productivity and very likely to remove back-queue very quickly and stop it from ever growing again.
  • Each cron still only does a single URL, which takes from 0-5 seconds, and is unlikely to break the overall cron process (which may be much longer than 5 seconds, due to other cron events, and can cause havoc in a WP site if it goes over the max execution time of PHP, often only 30s)
  • Relies on constant traffic to be effective. If only one page is loaded every five minutes the cron will thus only be triggered once every 5 minutes, resulting in the same speed of URL processing as the default. By extension, we can imagine a very rarely loaded site with traffic once per day, and which would thus only process one link per day. This is extremely unlikely, but clarifies the consideration.
  • The other theoretical downside is that for many sites this will cause more overall cron runs as the cron system only fires when there are active tasks pending, and e.g. setting a 1 min schedule means that on many of those minutes, Amber's callback will be the only one pending. IMHO this is not a meaningful concern, as even on shared hosting, one request per minute should not be a hosting burden, and the downsides of this are far less disruptive than the potential risk of running long processes during cron runs on similarly weak infrastructure.

Increase URLs processed per cron run: By filtering the number of times Amber::dequeue_link() gets run. E.g. Setting it to run 5 times per run.

  • As long as the site is loaded at least once every 5 minutes (default cron schedule) this will result in the same 5x speedup of URL processing.
  • Benefit is that it will increase speed even if site isn't loaded constantly, and if it is loaded less than once per 5 minute period, this will be the only way to increase the spead of URL processing.
  • Downside is very important and usually will make the other option (filtering cron schedule) preferrable: Increasing the number of URLs processed per run has a high likelihood of interfering with overall cron completion.
  • If you set it to process 10 URLs, and they each took 5 seconds, that is already 50 seconds for the cron run. If your PHP max execution time is 30 seconds, that cron will never complete, and your whole WordPress site is likely to malfunction as cron processes never get their chance to completely. Even if your max execution time is 60 seconds, it still only leaves 10s for all other processes.
  • Thus the number of URLs processed per run should never be set very high, and ideally would only be used if increasing the cron schedule to 1 min/30 seconds doesn't have enough impact on the queue size (unlikely)

So for most sites, altering the cron schedule to run Amber::cron_event_hook() more often will be the best solution. It will work well for large sites with a large corpus of links, which will almost always also get regular traffic -- if only from search engines updating their caches of said corpus — to match their large URL queue in amber.

Nice to have: wp-admin setting for cron schedule

If you were going to add a UI setting inside Amber to control the rate of URL dequeing, I would definitely make it control the cron schedule, and leave the number of URLs dequeued per run to a filter as described above.

A pulldown menu with [Every 10 minutes|Every 5 minutes|Every 1 minute] in Amber Settings > Storage Settings would be extremely useful for us and probably many other sites.

Like I said though, a filter and a little documentation would work just as well for power users, who are most likely to need this option.

Thanks for your attention and for considering this request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant