Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPIKE: Secretless benchmark metrics are defined #1398

Closed
4 tasks
doodlesbykumbi opened this issue Apr 6, 2021 · 2 comments
Closed
4 tasks

SPIKE: Secretless benchmark metrics are defined #1398

doodlesbykumbi opened this issue Apr 6, 2021 · 2 comments

Comments

@doodlesbykumbi
Copy link
Contributor

doodlesbykumbi commented Apr 6, 2021

Overview

For this effort our focus is on TCP connectors, and only the "streaming" segment of the connection (post authentication). The metrics we are interested in are therefore

  1. TCP streaming latency
  2. TCP streaming throughput

The goal here is

  1. To have common language for Secretless benchmark metrics.
  2. To capture the code changes necessary to measure a given metric.
  3. Make (2) flexible enough to cater to new metrics

To provide a broader context for our language around the metrics we should elaborate on the pathways that different types of connections (TCP, HTTP etc.) take and how they are handled by Secretless; ensuring to describe the lifecycle of connections (auth vs streaming etc.). Diagrams and flow charts will be useful here. It would also help if such diagrams references parts of the code.

For the metrics we should associate them with some metrics type (e.g. Value recorder is appropriate for latency), and capture any (correlating/otherwise) metadata.

Definition of done

  • A definition of each Secretless benchmark metric.
    • How and where the metric is measured during the lifecycle of a Secretless connection.
    • Where, in the code, changes would need to be made for (2)
    • What is the associated metric type in Opentelemetry
@doodlesbykumbi doodlesbykumbi changed the title Secretless benchmark metrics are defined SPIKE: Secretless benchmark metrics are defined Apr 6, 2021
@izgeri
Copy link
Contributor

izgeri commented Apr 6, 2021

In the feature spec we are focused on TCP - can we focus on TCP here as well? Can we also ensure scope is clear, eg right now we only need to measure post-connection streaming latency and throughput. In particular, from the doc we have:

Secretless latency under ‘light’ load and maximum throughput (Mbps) per single Secretless container has been measured. ‘Light’ load or operating profile is defined as above, where Secretless is communicating with a database cluster with variable data size per request.

I think in this card we are asking: how can we measure post-authentication handshake latency / throughput?

One other relevant requirement:

The telemetry data output by Secretless must be easy to query across an arbitrary period of time with at least a week of history, and related events should be tagged in a way that makes it easy to aggregate them.

So at a minimum, the data that's output has to be timestamped and tagged to indicate which specific connection the datapoint is relevant to.

@doodlesbykumbi
Copy link
Contributor Author

doodlesbykumbi commented May 7, 2021

The outcome from this spike is captured in the telemetry branch, where there is a reference implementation of metric measurement, collection, export and analysis.


For making measurements we start by noting that network I/O in Go is blocking. That means reads will block a goroutine until the buffer has something. We use io.Copy to implement unidirectional streaming, it takes as input a destination io.Writer and a source io.Reader. Bidirectional TCP streaming in Secretless (see duplexStream) is the result of 2 Go routines each carrying out streaming (via io.Copy) in a particular direction, taking as input the client and target TCP connections. io.Copy handles all the reading, writing and buffering. The io.Copy for each direction blocks in its goroutine for the lifetime of the streaming.

In order to take measurements, we must instrument each TCP connection instance by wrapping it to intercept reads and writes.

Metrics definitions

The metrics are defined below. NOTE that each metric is e

TCP streaming latency

The time between when a source connection's Read unblocks from the arrival of some packet to when the destination connection's Write for that packet returns.

NOTE: some small experiments indicate that the order of events in Secretless is that there is always a source connection Read followed by a destination connection's Write.

TCP streaming throughput

The ongoing sum of the number of bytes written in any direction.

POC implementation

The POC implementation follows the ideas above.

The metrics are defined and labelled with connector specific information like the Secretless service name and connector type.

meter := metric.Must(s.meter)
labels := []attribute.KeyValue{
attribute.String("service.name", config.Connector + ":" + "secretless"),
attribute.String("secretless.service_name", config.Name),
attribute.String("secretless.connector_name", config.Connector),
}
throughputCounter := meter.NewInt64Counter(
"secretless.tcp.stream.bytes",
metric.WithUnit(unit.Bytes),
).Bind(labels...)
latencyRecorder := meter.NewInt64ValueRecorder(
"secretless.tcp.stream.latency",
).Bind(labels...)

The connections are instrumented to allow measurement of latency and throughput, and measurements are taken via OpenTelemetry.

clientErrChan, destErrChan := duplexStream(
&ReadWriteNotifier{
readWriter: clientConn,
onWrite: func(bytesWritten int, timeToHandoff time.Duration) {
// clientWrite
streamLatency := time.Now().Sub(lastTargetRead)
ctx := context.Background()
proxy.throughputCounter.Add(ctx, int64(bytesWritten))
proxy.latencyRecorder.Record(ctx, streamLatency.Microseconds())
},
onRead: func(bytesRead int, timeSpentBlocking time.Duration) {
// clientRead
lastClientRead = time.Now()
},
}, &ReadWriteNotifier{
readWriter: targetConn,
onWrite: func(bytesWritten int, timeToHandoff time.Duration) {
// targetWrite
streamLatency := time.Now().Sub(lastClientRead)
ctx := context.Background()
proxy.throughputCounter.Add(ctx, int64(bytesWritten))
proxy.latencyRecorder.Record(ctx, streamLatency.Microseconds())
},
onRead: func(bytesRead int, timeSpentBlocking time.Duration) {
// targetRead
lastTargetRead = time.Now()
},
},
)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

2 participants