Skip to content

Graphite

Vladislav Alekseev edited this page Mar 15, 2023 · 19 revisions

Setting Up Graphite Analytics

In order to enable Graphite events reporting, you must provide necessary settings, like socket and event name prefix, via queue configuration.

Events

All events will be prefixed by a string from queue configuration. reserved values may be used in a future releases for additional information.

Queue-side events

Enqueueing and dequeueing

[prefix].queue.buckets.dequeue.<worker_id>.<emcee_version>.<queue_host>.reserved.reserved

An event of dequeueing a single bucket by the given worker_id.

[prefix].queue.tests.dequeue.<worker_id>.<emcee_version>.<queue_host>.reserved.reserved

An event of dequeueing a number of tests by the given worker_id. Test count depends on how many tests are in the dequeued bucket.

[prefix].queue.buckets.enqueue.<emcee_version>.<queue_host>.reserved.reserved

An event of bucket enqueueing. Happens when you schedule a job with tests. Each bucket may contain an arbitrary number of tests.

[prefix].queue.tests.enqueue.<emcee_version>.<queue_host>.reserved.reserved

An event of test enqueueing. Happens when you schedule a job with tests.

Job state

[prefix].queue.jobs.state.dequeued.<queue_host>.<job_id>.<emcee_version>.reserved.reserved.reserved

Current number of dequeued buckets for a given job_id on a queue running on queue_host, with emcee_version version. This metric can indicate how many devices (simulators) are busy performing the given job.

[prefix].queue.jobs.state.enqueued.<queue_host>.<job_id>.<emcee_version>.reserved.reserved.reserved

Current number of enqueued buckets for a given job_id on a queue running on queue_host, with emcee_version version. While workers are performing tests from the given job_id, this metric will decrease.

Global queue state

[prefix].queue.jobs.count.<queue_host>.<emcee_version>.reserved.reserved.reserved

An event indicating how many jobs are scheduled onto the queue. When job completes, it is removed. Each job usually represents a single test run. This metric can be used to estimate how many test run jobs are happening at the same time.

[prefix].queue.state.dequeued.<queue_host>.<emcee_version>.reserved.reserved.reserved

This is a global metric for all jobs across the queue running on queue_host and with emcee_version version. This metric can indicate how many devices (simulators) are busy executing jobs from the specified queue. If queue is empty, and all devices are idle, this metric will have value of 0. If queue has some buckets enqueues, and all devices are busy, this metric will indicate how many devices are processing buckets.

[prefix].queue.state.enqueued.<queue_host>.<emcee_version>.reserved.reserved.reserved

This is a global metric for all jobs across the queue running on queue_host and with emcee_version version. This metric indicates how many buckets are enqueued. Each bucket may have an arbitrary number of tests in it.

Worker state

[prefix].queue.workers.utilizable.count.<emcee_version>.<queue_host>.reserved.reserved.reserved

Indicates how many workers are utilizable by the given queue host and version. This metric describes how worker sharing feature is behaving.

[prefix].queue.worker.status.<worker_id>.<status>.<emcee_version>.reserved

This event indicates a status of worker_id for queue with emcee_version version. Queue host must be added as well, but it is not there yet. Workers may have different statuses, e.g. notRegistered when worker has failed to start (or still starting), disabled when worker has been explicitly disabled, silent when worker has started but became unresponsive, and alive when worker appears to be working normally.

Bucket metrics

[prefix].queue.buckets.stuck.<reason>.<worker_id>.<emcee_version>.<queue_host>.reserved.reserved

This event happens when a queue running on queue_host and with emcee_version version detects a stuck bucket on a worker worker_id, and it re-enqueues it back to the queue. reason indicates why queue decided to re-enqueue the bucket: it can be because worker process is silent (e.g. crashed) or because it lost that bucket, i.e. switched to processing another bucket without retuning back the testing results.

[prefix].queue.buckets.time_to_dequeue.<emcee_version>.<queue_host>.reserved.reserved

Metric represents a duration between enqueueing bucket to the queue and dequeueing it by a worker.

Statsd: [prefix].bucket.duration.<queue_host>.<emcee_version>.<persistent_metrics_job_id>.reserved.reserved.reserved.reserved.reserved

This statsd metric indicates a duration of processing a single bucket that was a part of a job with persistent_metrics_job_id on queue queue_host. Example: imagine we create a job to run some unit tests (persistent_metrics_job_id == "unittests"). This metric will be used to report durations for all buckets that were created for that job.

Test metrics

[prefix].tests.time_to_start.<test_class_name>.<test_method_name>.<emcee_version>.<queue_host>.reserved.reserved

This event indicates a duration between when test test_class_name/test_method_name is enqueued to the queue and it starts executing. This metric indicates how long the test was held in the queue. Generally it does not matter which exact test has been held. Holding tests in queue for long durations means workers are not serving the queue fast enough, and that jobs are moving slower for the users.

[prefix].test.duration.<result>.<worker_host>.<test_class>.<test_method>.<emcee_version>.reserved.reserved

This event indicates a duration of the given test after it has been executed on worker_host, and a general test result (success/failure/lost).

Statsd: [prefix].test.duration.<worker_host>.<emcee_version>.<persistent_metrics_job_id>.<result>.reserved.reserved.reserved.reserved

This statsd metric has a value of test duration. The test was a part of a job with persistent_metrics_job_id.

[prefix].test.started.<worker_host>.<test_class>.<test_method>.<emcee_version>.reserved.reserved

This event indicates that a given test started.

[prefix].test.finished.<result>.<worker_host>.<test_class>.<test_method>.<emcee_version>.reserved.reserved

This event indicates that a given test finished with a given result.

[prefix].test.preflight.<worker_host>.<emcee_version>.reserved.reserved.reserved

This event indicates a duration between when runner process (e.g. xcodebuild) starts and until the first test starts execution. This metric indicates a test runner overhead before it actually starts executing tests.

[prefix].test.postflight.<worker_host>.<emcee_version>.reserved.reserved.reserved

This event indicates a duration between when the last test in a bucket finishes and until test runner (e.g. xcodebuild) terminates. This metric indicates a test runner overhead after it ran all tests.

[prefix].test.between_tests.duration.<worker_host>.<emcee_version>.reserved.reserved.reserved

This metric indicates a time between a previous test finish and the next test start events. This value may indicate a possible overhead of test preparation or tear down.

[prefix].test.useless_runner_invocation.<host>.<emcee_version>.reserved.reserved.reserved.reserved

This metric is reported when Emcee attempts to run a set of tests using a test runner (xcodebuild), but the test runner does not execute any test. Emcee may terminate it because of timeouts, or test runner may silently exit. Anyway, the point is that there was an attempt to run some tests, but it was just a waste of time. The value for this metric is how much time has been wasted.

Job metrics

Statsd: [prefix].job.preparation.<queue_host>.<emcee_version>.<persistent_metrics_job_id>.<success|failure>.reserved.reserved.reserved.reserved.reserved

This statsd metric has a value of how long it took to prepare and schedule a job with persistent_metrics_job_id on the queue_host.

Preparation is a process of performing a test discovery and scheduling tests on a remote queue.

Statsd: [prefix].job.duration.<queue_host>.<emcee_version>.<persistent_metrics_job_id>.reserved.reserved.reserved.reserved.reserved

This statsd metric has a value of how long it took to process a job with persistent_metrics_job_id on the queue queue_host.

Simulator metrics

[prefix].simulator.allocation.duration.<worker_host>.<allocation_outcome>.<emcee_version>.reserved.reserved.reserved

This metric represents a duration of how long the simulator was being prepared for test execution. This duration may or may not include creation, boot and patching timings. Generally this metric means a time Emcee took to prepare a simulator for executing a test. This metric will grow if you delete simulators more often, and it will tend to zero if simulator was ready for test execution.

[prefix].simulator.action.duration.<action>.<worker_host>.<device_type>.<runtime>.<is_success>.<emcee_version>.reserved.reserved.reserved

This metric represents a duration of different simulator actions like create, boot, shutdown, delete.

Test discovery metrics

[prefix].runtime_dump.test_case_count.<test_bundle_name>.<emcee_version>.reserved.reserved.reserved

This metric indicates how many test cases (XCTestCase subclasses) were discovered in a test_bundle_name test bundle.

[prefix].runtime_dump.test_count.<test_bundle_name>.<emcee_version>.reserved.reserved.reserved

This metric indicates how many tests (test* methods) were discovered in a test_bundle_name test bundle.

Statsd: [prefix].test.discovery.duration.<host>.<emcee_version>.<persistent_metrics_job_id>.<success|failure>.reserved.reserved.reserved

This statsd metric has a duration of test discovery process that has been performed on a host for a persistent_metrics_job_id job.

Clone this wiki locally