Skip to content

Ent Throttling

Mike Perham edited this page Nov 7, 2019 · 14 revisions

With Faktory, there is no way to limit job dispatch. You could implement distributed locks within your application code but that's a daunting task for many. Faktory Enterprise allows you to throttle the rate jobs are fetched from a given queue.

Use Cases

  1. You have a instagram scraping queue whose SLA requires you process no more than 4 concurrently.
  2. You have a bulk queue for low priority work but don't want a huge mass of bulk jobs taking every processing thread available, delaying the fetch of critical jobs. You can limit each worker to only process 5 bulk jobs at a time.

Throttles

Queue throttling is declared within conf.d/throttles.toml:

[throttles]
instagram = { concurrency = 4, timeout = 60 }
bulk = { worker = 5, timeout = 60 }

You declare the queue name and the set of properties for its associated throttle.

Types

There are two throttling algorithms today:

  1. Concurrency - process up to N jobs concurrently across your entire Faktory worker cluster
  2. Per Worker - process up to N jobs per worker across your entire Faktory worker cluster

Reminder: a worker is a process which fetches jobs from Faktory. You might have M worker processes per machine, each with X threads.

Deployment Example

Your worker cluster has 10 machines. Each machine has 4 worker processes, each with 5 threads. You have a grand total of 200 worker threads, so your cluster can execute 200 jobs concurrently.

  • With instagram = { concurrency = 4 }, your cluster can only execute 4 jobs concurrently from the queue across all 200 threads. You will have at least 196 threads available for processing other queues.
  • With bulk = { worker = 2 }, your cluster can only execute 80 jobs concurrently from the bulk queue: 10 * 4 * 2. You will have at least 120 threads for processing other queues.

Timeout

Throttles require locks to be taken during FETCH and released upon ACK or FAIL. If a worker process crashes, any locks held on its behalf are effectively orphaned. To handle this case, all throttles must have a timeout, after which a lock expires. Make sure the timeout is set longer than you expect any job within that queue to take! The default timeout is 60 seconds if not configured.

Metrics

Throttles will send real-time metrics to Pro-Metrics for you to monitor:

  • throttle.lock is a counter, incremented for each successful queue lock.
  • throttle.unlock is a timer, incremented for each unlock, with the time held.

Each metric is tagged with the associated queue name.

Web UI

The Faktory Web UI contains a Throttles tab which shows each currently declared throttle and associated runtime usage metrics.

throttling UI

The table shows basic usage metrics for each throttle.

Taken and Free are not calculated for Per Worker throttles because they are more dynamic: this would require a more heavyweight O(n) Redis scan.

Overage and Reclaimed are important metrics to watch: if they aren't zero, the throttle SLA can be violated. Overage means a job ran over the throttle timeout; Reclaimed is the number of times an expired lock was reused before being unlocked, which can mean the throttle was violated.

Notes

  • When calling FETCH critical default bulk, throttled queues (in this example, bulk) are checked before normal queues.
  • If you make changes to the throttle setup in throttles.toml, send Faktory the HUP signal to reload the configuration. The changes will take affect immediately.
You can’t perform that action at this time.