Ent Metrics

Faktory Enterprise can emit real-time metrics to Statsd for monitoring and alerting.

Statsd

In /etc/faktory/conf.d/statsd.toml, add the following section to enable statsd metrics:

[statsd]
  # required, location of the statsd server
  location = "hostname:port"

  # Prepend all metric names with this value, defaults to 'faktory.'
  # If you have multiple Faktory servers for multiple apps reporting to
  # the same statsd server you can use a multi-level namespace, 
  # e.g. "app1.faktory.", "app2.faktory." or use a tag below.
  #namespace = "faktory."

  # optional, DataDog-style tags to send with each metric.
  # keep in mind that every tag is sent with every metric so keep tags short.
  #tags = ["env:production", "region:us-east-1a"]

  # Statsd client will buffer metrics for 100ms or until this size is reached.
  # The default value of 15 tries to avoid UDP packet sizes larger than 1500 bytes.
  # If your network supports jumbo UDP packets, you can increase this to ~50.
  #bufferSize = 15
  
  # Calculate the queue latency for this set of queues also
  queueLatency = ["critical", "default"]

Tags must conform to DataDog's specifications:

tag must match "\A[a-zA-Z][\w\-\:\.\/]*\z"
host, device, source, and service are reserved words

Metrics

Global

Faktory sends global metrics, similar to those seen in the Web UI, every 30 seconds.

Name	Type	Description
processed	Gauge	Total number of jobs processed (success = processed - failures)
failures	Gauge	Total count of failed job executions
scheduled	Gauge	Current number of scheduled jobs
retries	Gauge	Current number of jobs to be retried
dead	Gauge	Current number of Dead jobs
busy	Gauge	Current number of jobs being processed
ops.connections	Gauge	Faktory client network connections
ops.commands	Gauge	Client commands processed by Faktory
ops.memory	Gauge	Faktory RAM usage, in bytes
ops.redis.connections	Gauge	Redis network connections
ops.redis.memory	Gauge	Redis RAM usage, in bytes
enqueued	Gauge	Total count of all jobs within queues
enqueued.{name}	Gauge	Size of {name} queue

Global metrics are tagged only with the set of tags configured in TOML above. Note that queues hold jobs that are ready to execute now. Scheduled, Retries, and Dead jobs are not enqueued.

Latency

Queue latency is a more expensive operation: Faktory has to peek at the first job, parse the JSON and check the enqueued_at element. Because of this additional expense, Faktory will only gather queue latency for queues that you opt-into:

# statsd.toml
[statsd]
  queueLatency = ["default", "bulk"]

Name	Type	Description	Tags
latency.{queue}	Gauge	The time between now and when the first job in queue was enqueued	n/a

Job Execution

Basic job execution metrics are sent in real-time as jobs are processed.

Name	Type	Description	Tags
jobs.count	Counter	Total job execution count, increments upon ACK/FAIL	"queue:{queue}", "jobtype:{type}"
jobs.failed	Counter	Total job failure count, increments upon FAIL	"queue:{queue}", "jobtype:{type}"
jobs.perform	Gauge (time)	Time between FETCH and ACK	"queue:{queue}", "jobtype:{type}"

This wiki is tracked by git and publicly editable. You are welcome to fix errors and typos. Any defacing or vandalism of content will result in your changes being reverted and you being blocked.

Provide feedback

Saved searches