Skip to content
Sonia Hamilton edited this page May 9, 2019 · 2 revisions

Some discussions and questions I've had that may be of interest:

3/May/2019

@sonia: https://github.com/soniah/gosnmp/issues/191

We are using GOSNMP to get data for millions of devices. We have noticed as we scan more and more devices, timeouts start increasing but CPU and memory stays low.

With a 4 vCPU and 16G VM, we can do around 1500 devices per second per VM, trying to get higher throughput but then timeout errors occur more often.

Any recommendations on ways to effectively use this lib when dealing with this kind of load?

Current optimizations on our end:

  • setting linux file descriptors to couple hundred thousand, never reach this limit
  • We use sync pool to re-use instances of GoSNMP
  • keeping re-try to 1 with 6 second timeout, higher retries causes requests to take longer
  • leveraging getbulk and bulkwalk as much as possible

Trying to figure out what might be the bottleneck if it's not CPU, memory or network bandwidth.

@sonia: thanks for you question. Unfortunately I don't have access to a large network of devices at the moment, but I've posted the above in case others have suggestions.

@ps0296 Is there a way to re-use sockets instead of opening and closing for each device?

@int3rlop3r Found this interesting comment. Not sure if it's the same issue. https://github.com/microsoft/ethr/blob/master/ethr.go#L206

// Set GOMAXPROCS to 1024 as running large number of goroutines in a loop
// to send network traffic results in timer starvation, as well as unfair
// processing time across goroutines resulting in starvation of many TCP
// connections. Using a higher number of threads via GOMAXPROCS solves this
// problem.
//
runtime.GOMAXPROCS(1024)
Clone this wiki locally