Skip to content

Host Health Monitor

Jeff Hollan edited this page Jun 14, 2019 · 15 revisions

The Host Health Monitor feature of the Functions Runtime monitors various VM sandbox imposed performance counters. The goal is to temporarily stop the host from doing more work when thresholds for any of the counters are about to be exceeded. This allows the host to avoid hitting hard sandbox limits which could cause a hard shutdown, and also allows the host to gracefully complete in-progress work while waiting for the counters to return to normal limits. The performance counters currently monitored are:

  • Connections : Number of outbound connections (limit is 600 active, 1200 total). For information on handling connection limits, see Managing Connections.
  • Threads : Number of threads (limit is 512).
  • Processes: Number of child processes (limit is 32).
  • NamedPipes: Number of named pipes (limit is 128).
  • Sections: Related to file create operations. The underlying resource is Named Shared Memory sections created by CreateFileMapping calls. (limit is 256).

Note that the limits above are the hard limits enforced by the sandbox. The actual thresholds used by the monitor are a percentage of these maximums (default is 0.80). When one or more counters are nearing their thresholds, the host will be stopped until the counter values return to normal. The Web App continues to run, but internally the host has been stopped, and no new functions will be run. If the Function App is scaled out to multiple instances, other instances will continue to run and pick up the workload. Once the counter values return to normal, the host will start processing work again automatically. If after waiting for a while the counter values do not recover, the App Domain will be recycled in an attempt to recover.

If your Function App is hitting these thresholds, you'll see errors like "Host thresholds exceeded: [Connections]" being logged, where the brackets will show the set of counters exceeded. If this is happening often, the offending function(s) will need to be examined, to ensure that they're using resources appropriately and are throttled correctly. E.g. is your function code opening up a large/unbounded number of outgoing connections?

The feature is currently only active on Consumption plan, where these sandbox limits exist. The feature is enabled by default, but can be disabled/configured via the healthMonitor section of host.json, e.g.

{
    "healthMonitor": {
        "enabled": true,
        "healthCheckInterval": "00:00:10",
        "healthCheckWindow": "00:02:00",
        "healthCheckThreshold": 6,
        "counterThreshold": 0.80
    }
}

Description of settings:

  • enabled: Whether the feature is enabled. Default is true.
  • healthCheckInterval: The time interval between the periodic background health checks. Default is 10 seconds.
  • healthCheckWindow: A sliding time window used in conjunction with the healthCheckThreshold setting (see below).
  • healthCheckThreshold: Maximum number of times the health check can fail before a host recycle is initiated.
  • counterThreshold: The threshold at which a performance counter will be considered unhealthy. Default is 0.80.
Clone this wiki locally
You can’t perform that action at this time.