-
Notifications
You must be signed in to change notification settings - Fork 219
Description
Using experimental ops file (i know), results in dopplers still being overloaded
Summary
We were able to stop a lot of noise from verbose apps but there was still log loss in other apps.
Steps to Reproduce
We checked to ensure that the logging levels were generally quite low so dopplers were not overloaded before we started.
We have applied this (diego.executor.max_log_lines_per_second) setting to a CF env with a setting of 100 and a setting of 50. We have also made a few simple ruby apps which a for loop showing an incrementing number with a variety of sleeps per cycle from 0.1 second wait to no delay at all.
We can then see errors in the logs like this...
2020-05-01T13:58:21.82+0100 [APP/PROC/WEB/0] OUT app instance exceeded log rate limit (100 log-lines/sec) set by platform operator
Which is ace
We found that there was a lot of dropped logs from the busy app, as we would expect. This is good. However, we also saw a small amount number of drops from the slower paced apps which means that the drops on the diego-cell were not effective.
We also found that the dopplers were still seeing lots of drops. Dopplers should not have seen as many drops as the logs should have been dropped before they got to the doppler component.
We may be misunderstanding what we're seeing but also, it may serve us right for using an experimental ops file.
Diego repo
executor
Environment Details
cf-deployment/v12.34.0
diego-release/v2.44.0
Possible Causes or Fixes (optional)
Log streaming holding things in a local cache on the cell?
Other metrics in the firehose also being emitted so although the application logs are reducing, the other metrics are coming out much faster too which are not blocked