Add caching for the topics metadata#75
Conversation
|
Sorry for the big PR. I upgraded sarama to v1.19.0 because the client metadata was being cleared. Even if a larger Setting the |
|
Setting the |
|
@jorgelbg do you think this can fix problem of having a gaps in dataseries ? I am now testing kafka-exporter in our staging environment and this happen all over again. What I have found is that this gaps occur in a similar timefreames when metadata gets refreshed (note that this io timeouts also happen occasionally): For me it is a major issue that prevents me from using this exporter inside of our alerting chain. I've posted similar comments but see no activity for a longer time ... |
|
looks good to me. Sorry for the delayed response. |
|
@danielqsj don't worry, thanks for taking the time! @bojleros this could help (I think) because it will reduce the additional time that refreshing the metadata causes, especially if you have a big cluster. Increasing the scrape interval in prometheus could also help. Reading the lag/offset data and sending the response back could also be time consuming if you have a lot of topics. |
|
I am going to start my tests tomorrow morning. Our clusters are rather small with tightly monitored performance. I am sure we have resources and not hitting fd limit. We do also have our own tool (https://github.com/msales/kage) running next to the kafka_exporters. It is how this gaps halted us before using kafka_exporter on production. I am not an kage developer but maybe you will be interested in comparing code of both projects in search for possible improvements. Apart of that i saw that kafka exporter is producing a new values every each minute. Is it feasible to have it refreshing values each 15 seconds in order to fit our default scrape duration ? Kind Regards, |

This implements the feature mentioned in #74. Adding a configurable caching interval for the topics metadata. Only the metadata is cached, the offset/lag is requested/calculated everytime so this should not affect the metrics.
Depending on the interval configured (30s by default) could be the case that a new topic is added and is not picked up immediately.