-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add syslog batching poc implementation #491
base: main
Are you sure you want to change the base?
Changes from all commits
2d2556e
5170ba2
48a13d8
f206257
35104c5
21666c8
3c9934f
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,117 @@ | ||
package syslog | ||
|
||
import ( | ||
"bytes" | ||
"crypto/tls" | ||
"time" | ||
|
||
"code.cloudfoundry.org/go-loggregator/v9/rpc/loggregator_v2" | ||
metrics "code.cloudfoundry.org/go-metric-registry" | ||
"code.cloudfoundry.org/loggregator-agent-release/src/pkg/egress" | ||
) | ||
|
||
const BATCHSIZE = 256 * 1024 | ||
|
||
type HTTPSBatchWriter struct { | ||
HTTPSWriter | ||
msgBatch bytes.Buffer | ||
batchSize int | ||
sendInterval time.Duration | ||
sendTimer *TriggerTimer | ||
egrMsgCount float64 | ||
} | ||
|
||
func NewHTTPSBatchWriter( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you please add tests for this writer. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Some tests are already added, do we need tests for the things already done by the httpsWriter (Error handling and the likes, which is anyways the same?) |
||
binding *URLBinding, | ||
netConf NetworkTimeoutConfig, | ||
tlsConf *tls.Config, | ||
egressMetric metrics.Counter, | ||
c *Converter, | ||
) egress.WriteCloser { | ||
client := httpClient(netConf, tlsConf) | ||
binding.URL.Scheme = "https" // reset the scheme for usage to a valid http scheme | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What's the purpose of changing the scheme here? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. the scheme is used to differentiate between the different endpoints (https and https-batched) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does mutating the scheme here affect the metrics emitted? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I will check for the metrics being changed here. |
||
return &HTTPSBatchWriter{ | ||
HTTPSWriter: HTTPSWriter{ | ||
url: binding.URL, | ||
appID: binding.AppID, | ||
hostname: binding.Hostname, | ||
client: client, | ||
egressMetric: egressMetric, | ||
syslogConverter: c, | ||
}, | ||
batchSize: BATCHSIZE, | ||
sendInterval: 1 * time.Second, | ||
egrMsgCount: 0, | ||
} | ||
} | ||
|
||
func (w *HTTPSBatchWriter) sendMsgBatch() error { | ||
currentEgrCount := w.egrMsgCount | ||
currentMsg := w.msgBatch.Bytes() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It may not be safe to reset the buffer before using the slice in the request. The docs say:
|
||
|
||
w.egrMsgCount = 0 | ||
w.msgBatch.Reset() | ||
|
||
return w.sendHttpRequest(currentMsg, currentEgrCount) | ||
} | ||
|
||
// Modified Write function | ||
func (w *HTTPSBatchWriter) Write(env *loggregator_v2.Envelope) error { | ||
msgs, err := w.syslogConverter.ToRFC5424(env, w.hostname) | ||
if err != nil { | ||
return err | ||
} | ||
|
||
for _, msg := range msgs { | ||
w.msgBatch.Write(msg) | ||
w.egrMsgCount += 1 | ||
w.startAndTriggerSend() | ||
} | ||
return nil | ||
} | ||
|
||
// TODO: Error back propagation. Errors are not looked at further down the call chain | ||
func (w *HTTPSBatchWriter) startAndTriggerSend() { | ||
if w.sendTimer == nil || !w.sendTimer.Running() { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As an alternative to this approach you could do something similar to the ingress client example and have a channel to communicate with a sending goroutine that does a select on that channel and a timer channel. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As already explained here: |
||
w.sendTimer = NewTriggerTimer(w.sendInterval, func() { | ||
w.sendMsgBatch() | ||
}) | ||
} | ||
if w.msgBatch.Len() >= w.batchSize { | ||
w.sendTimer.Trigger() | ||
} | ||
} | ||
|
||
type TriggerTimer struct { | ||
triggered bool | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think there's still a race here with more than one goroutine using You can try running the tests with the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for the hint, will think about it and try to test it. |
||
execFunc func() | ||
} | ||
|
||
func NewTriggerTimer(d time.Duration, f func()) *TriggerTimer { | ||
timer := &TriggerTimer{ | ||
triggered: false, | ||
execFunc: f, | ||
} | ||
timer.initWait(d) | ||
|
||
return timer | ||
} | ||
|
||
func (t *TriggerTimer) initWait(duration time.Duration) { | ||
timer := time.NewTimer(duration) | ||
go func() { | ||
<-timer.C | ||
t.Trigger() | ||
}() | ||
} | ||
|
||
func (t *TriggerTimer) Trigger() { | ||
if !t.triggered { | ||
t.triggered = true | ||
t.execFunc() | ||
} | ||
} | ||
|
||
func (t *TriggerTimer) Running() bool { | ||
return !t.triggered | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,180 @@ | ||
package syslog_test | ||
|
||
import ( | ||
"bytes" | ||
"crypto/tls" | ||
"io" | ||
"net/http" | ||
"net/http/httptest" | ||
"time" | ||
|
||
"code.cloudfoundry.org/go-loggregator/v9/rfc5424" | ||
"code.cloudfoundry.org/go-loggregator/v9/rpc/loggregator_v2" | ||
metricsHelpers "code.cloudfoundry.org/go-metric-registry/testhelpers" | ||
"code.cloudfoundry.org/loggregator-agent-release/src/pkg/egress" | ||
"code.cloudfoundry.org/loggregator-agent-release/src/pkg/egress/syslog" | ||
. "github.com/onsi/ginkgo/v2" | ||
. "github.com/onsi/gomega" | ||
) | ||
|
||
var triggered = 0 | ||
var string_to_1024_chars = "saljdflajsdssdfsdfljkfkajafjajlköflkjöjaklgljksdjlakljkflkjweljklkwjejlkfekljwlkjefjklwjklsdajkljklwerlkaskldgjksakjekjwrjkljasdjkgfkljwejklrkjlklasdkjlsadjlfjlkadfljkajklsdfjklslkdfjkllkjasdjkflsdlakfjklasldfkjlasdjfkjlsadlfjklaljsafjlslkjawjklerkjljklasjkdfjklwerjljalsdjkflwerjlkwejlkarjklalkklfsdjlfhkjsdfkhsewhkjjasdjfkhwkejrkjahjefkhkasdjhfkashfkjwehfkksadfjaskfkhjdshjfhewkjhasdfjdajskfjwehkfajkankaskjdfasdjhfkkjhjjkasdfjhkjahksdf" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I find the name here a little confusing. The string is ~440 bytes in length. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes this might be confusing. |
||
|
||
var _ = Describe("TriggerTimer testing", func() { | ||
BeforeEach(func() { | ||
triggered = 0 | ||
}) | ||
|
||
It("Timer triggered by call", func() { | ||
timer := syslog.NewTriggerTimer(10*time.Millisecond, trigger) | ||
timer.Trigger() | ||
//expect timer to be triggered and therefore stopped | ||
Expect(timer.Running()).To(BeFalse()) | ||
Expect(triggered).To(Equal(1)) | ||
time.Sleep(12 * time.Millisecond) | ||
//expect timer to stay stopped and not trigger func again | ||
Expect(timer.Running()).To(BeFalse()) | ||
Expect(triggered).To(Equal(1)) | ||
}) | ||
|
||
It("Timer triggered by time elapsed", func() { | ||
timer := syslog.NewTriggerTimer(10*time.Millisecond, trigger) | ||
//expect timer to be running and untriggered | ||
Expect(timer.Running()).To(BeTrue()) | ||
Expect(triggered).To(Equal(0)) | ||
time.Sleep(12 * time.Millisecond) | ||
//expect timer to be triggered and stopped | ||
Expect(timer.Running()).To(BeFalse()) | ||
Expect(triggered).To(Equal(1)) | ||
//expect timer to not be able to be retriggered | ||
timer.Trigger() | ||
Expect(timer.Running()).To(BeFalse()) | ||
Expect(triggered).To(Equal(1)) | ||
}) | ||
}) | ||
|
||
func trigger() { | ||
triggered += 1 | ||
} | ||
|
||
var _ = Describe("HTTPS_batch_testing", func() { | ||
var ( | ||
netConf syslog.NetworkTimeoutConfig | ||
skipSSLTLSConfig = &tls.Config{ | ||
InsecureSkipVerify: true, //nolint:gosec | ||
} | ||
c = syslog.NewConverter() | ||
drain *SpyDrain | ||
b *syslog.URLBinding | ||
writer egress.WriteCloser | ||
) | ||
string_to_1024_chars += string_to_1024_chars | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The string is doubled in length here to reach 880 bytes? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I did not comment on why 880 bytes suffice here: |
||
|
||
BeforeEach(func() { | ||
drain = newBatchMockDrain(200) | ||
b = buildURLBinding( | ||
drain.URL, | ||
"test-app-id", | ||
"test-hostname", | ||
) | ||
writer = syslog.NewHTTPSBatchWriter( | ||
b, | ||
netConf, | ||
skipSSLTLSConfig, | ||
&metricsHelpers.SpyMetric{}, | ||
c, | ||
) | ||
}) | ||
|
||
It("testing simple appending of one log", func() { | ||
env1 := buildLogEnvelope("APP", "1", "message 1", loggregator_v2.Log_OUT) | ||
Expect(writer.Write(env1)).To(Succeed()) | ||
env2 := buildLogEnvelope("APP", "2", "message 2", loggregator_v2.Log_OUT) | ||
Expect(writer.Write(env2)).To(Succeed()) | ||
time.Sleep(2 * time.Second) | ||
|
||
Expect(drain.messages).To(HaveLen(2)) | ||
expected := &rfc5424.Message{ | ||
AppName: "test-app-id", | ||
Hostname: "test-hostname", | ||
Priority: rfc5424.Priority(14), | ||
ProcessID: "[APP/1]", | ||
Message: []byte("message 1\n"), | ||
} | ||
Expect(drain.messages[0].AppName).To(Equal(expected.AppName)) | ||
Expect(drain.messages[0].Hostname).To(Equal(expected.Hostname)) | ||
Expect(drain.messages[0].Priority).To(BeEquivalentTo(expected.Priority)) | ||
Expect(drain.messages[0].ProcessID).To(Equal(expected.ProcessID)) | ||
Expect(drain.messages[0].Message).To(Equal(expected.Message)) | ||
expected = &rfc5424.Message{ | ||
AppName: "test-app-id", | ||
Hostname: "test-hostname", | ||
Priority: rfc5424.Priority(14), | ||
ProcessID: "[APP/2]", | ||
Message: []byte("message 2\n"), | ||
} | ||
Expect(drain.messages[1].AppName).To(Equal(expected.AppName)) | ||
Expect(drain.messages[1].Hostname).To(Equal(expected.Hostname)) | ||
Expect(drain.messages[1].Priority).To(BeEquivalentTo(expected.Priority)) | ||
Expect(drain.messages[1].ProcessID).To(Equal(expected.ProcessID)) | ||
Expect(drain.messages[1].Message).To(Equal(expected.Message)) | ||
}) | ||
|
||
It("test early dispatch on high message load", func() { | ||
env1 := buildLogEnvelope("APP", "1", "string to get log to 1024 characters:"+string_to_1024_chars, loggregator_v2.Log_OUT) | ||
for i := 0; i < 300; i++ { | ||
writer.Write(env1) | ||
} | ||
Expect(drain.messages).To(HaveLen(256)) | ||
}) | ||
|
||
It("test batch dispatching with all logs in a given timeframe", func() { | ||
env1 := buildLogEnvelope("APP", "1", "string to get log to 1024 characters:"+string_to_1024_chars, loggregator_v2.Log_OUT) | ||
for i := 0; i < 10; i++ { | ||
writer.Write(env1) | ||
time.Sleep(99 * time.Millisecond) | ||
} | ||
time.Sleep(100 * time.Millisecond) | ||
Expect(drain.messages).To(HaveLen(10)) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In general in these tests can we use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. These are not final yet. |
||
}) | ||
|
||
It("probabilistic test for race condition", func() { | ||
env1 := buildLogEnvelope("APP", "1", "string to get log to 1024 characters:"+string_to_1024_chars, loggregator_v2.Log_OUT) | ||
for i := 0; i < 10; i++ { | ||
writer.Write(env1) | ||
time.Sleep(99 * time.Millisecond) | ||
} | ||
time.Sleep(100 * time.Millisecond) | ||
Expect(drain.messages).To(HaveLen(10)) | ||
}) | ||
}) | ||
|
||
func newBatchMockDrain(status int) *SpyDrain { | ||
drain := &SpyDrain{} | ||
handler := http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { | ||
|
||
body, err := io.ReadAll(r.Body) | ||
Expect(err).ToNot(HaveOccurred()) | ||
defer r.Body.Close() | ||
|
||
println(body) | ||
|
||
message := &rfc5424.Message{} | ||
|
||
messages := bytes.SplitAfter(body, []byte("\n")) | ||
for _, raw := range messages { | ||
if bytes.Equal(raw, []byte("")) { | ||
continue | ||
} | ||
message = &rfc5424.Message{} | ||
err = message.UnmarshalBinary(raw) | ||
Expect(err).ToNot(HaveOccurred()) | ||
drain.messages = append(drain.messages, message) | ||
drain.headers = append(drain.headers, r.Header) | ||
} | ||
w.WriteHeader(status) | ||
}) | ||
server := httptest.NewTLSServer(handler) | ||
drain.Server = server | ||
return drain | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We understand the goal of the HTTPSBatchWriter to be:
❓ Is that right?
Currently,
HTTPSBatchWriter
andTriggerTimer
doesn't appear to do that. We think that code actually results in the following behaviour:➡️ Can you please have a look at adjusting this code.
Here are some examples of ways we've done batching in the past:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
first thing:
Yes your understanding seems right on what we try to do.
I have pushed a newer version, that should comply with the envisioned behaviour.
Tests for the new version are also added, confirming the implementation complying with the wanted behaviour.
I am currently looking through your pointers to see if I can leverage some of the implementations shown there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the code proposed:
The implementation for the signal batcher uses slices and an append structure.
It has information of how long the batch is going to be, speeding up the execution.
In contrast, the syslog batching feature will not know beforehand, how long batches are going to be,
Giving the edge of speed to Byte.buffers:
https://stackoverflow.com/questions/39319024/builtin-append-vs-bytes-buffer-write
Also, the actual buffering code is pretty short, and the expected type for sending data with the http client used is Byte[], so the choice still seems obvious to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code regarding the triggerable timer on buffer length shown in the ingress client is from what I can tell the code I would also need for syslog batching.
I was able to get the approach I was working on into a working non blocking state, and I as a not as experienced Go programmer have a way easier time understanding what is happening with the code I provided.
However, if you just want to have a similar implementation to what is already in the codebase, I am fine with adopting it that way.
I was not able to parse out test cases that really confirm, that the implementation of the ingress_client is doing what is expected, so it was a little hard to read.