-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add syslog batching implementation #491
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, it looks fine to me. I don't seem where it adds the newline character to delimit between syslog lines though...
Would love to see a demo at the next ARP WG meeting.
(I sort of disregarded that this was a POC at points in time and some of my comments are more implementation-focused, sorry about that 😅 )
The newline is already part of the syslog messages, so these are added already by a method beforhand (linked for anyone curious): this is true for all possible syslog messages, so I do not even need to add this, which is really convenient. |
This addresses all the comments by @ctlong. It fixes the unneded if else branching for adding to the message batch which is just not needed anymore. It fixes the egressMetric to behave similar to the single message implementation, to not count erroneous logs.
Adressed all the comments and additions by @ctlong above, if sufficient, please close the threads :) |
The refactor is mainly reshuffeling The new timer implementation makes it more clear what the actual logic is, and might also prevent some unresolvable states. It now only has two states: - Running if a batch is not yet full or time triggered - Not running if there was a batch send either through a time or a size based trigger
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've still some specific concerns, which I've left as comments in this review.
In general, the implementation looks fine, though I'm not sure that I understand the necessity of the new TriggerTimer
struct.
@nicklas-dohrn can you please sign the CLA. We can't merge this unless you've done so. |
@ctlong I will take care about the CLA. @nicklas-dohrn has to be added to one of our GitHub orgs. |
This is a new approach to switch between http and http batching. It only is different in this regard from the previous attempts, and only contains refactorings besides this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Conceptually, I think this proof of concept is correct. Implementation-wise the timer still has some issues.
Once those are fixed, I would suggest rebasing this off #573 and testing the two changes together to see if it achieves the throughput you want. Then we're all ready for a real implementation (with tests).
🙏 Could you please also update the PR description, thanks.
fe08849
to
c937231
Compare
|
||
const BATCHSIZE = 256 * 1024 | ||
|
||
type HTTPSBatchWriter struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We understand the goal of the HTTPSBatchWriter to be:
- Buffer & batch incoming envelopes.
- If the batch reaches a certain size, flush the batch to a destination.
- On some interval, flush the batch to a destination.
❓ Is that right?
Currently, HTTPSBatchWriter
and TriggerTimer
doesn't appear to do that. We think that code actually results in the following behaviour:
- Add one message to the batch and sleep for some interval, then flush to a destination.
- After that, add messages to the batch as they come in.
- If the batch reaches a certain size, flush the batch to a destination.
➡️ Can you please have a look at adjusting this code.
Here are some examples of ways we've done batching in the past:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
first thing:
Yes your understanding seems right on what we try to do.
I have pushed a newer version, that should comply with the envisioned behaviour.
Tests for the new version are also added, confirming the implementation complying with the wanted behaviour.
I am currently looking through your pointers to see if I can leverage some of the implementations shown there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the code proposed:
The implementation for the signal batcher uses slices and an append structure.
It has information of how long the batch is going to be, speeding up the execution.
In contrast, the syslog batching feature will not know beforehand, how long batches are going to be,
Giving the edge of speed to Byte.buffers:
https://stackoverflow.com/questions/39319024/builtin-append-vs-bytes-buffer-write
Also, the actual buffering code is pretty short, and the expected type for sending data with the http client used is Byte[], so the choice still seems obvious to me.
egrMsgCount float64 | ||
} | ||
|
||
func NewHTTPSBatchWriter( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please add tests for this writer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some tests are already added, do we need tests for the things already done by the httpsWriter (Error handling and the likes, which is anyways the same?)
c *Converter, | ||
) egress.WriteCloser { | ||
client := httpClient(netConf, tlsConf) | ||
binding.URL.Scheme = "https" // reset the scheme for usage to a valid http scheme |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the purpose of changing the scheme here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the scheme is used to differentiate between the different endpoints (https and https-batched)
If i do not change it back to https for sends, the queried url will be https-batched://... , which is just not working then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does mutating the scheme here affect the metrics emitted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will check for the metrics being changed here.
c *Converter, | ||
) egress.WriteCloser { | ||
client := httpClient(netConf, tlsConf) | ||
binding.URL.Scheme = "https" // reset the scheme for usage to a valid http scheme |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does mutating the scheme here affect the metrics emitted?
) | ||
|
||
var triggered = 0 | ||
var string_to_1024_chars = "saljdflajsdssdfsdfljkfkajafjajlköflkjöjaklgljksdjlakljkflkjweljklkwjejlkfekljwlkjefjklwjklsdajkljklwerlkaskldgjksakjekjwrjkljasdjkgfkljwejklrkjlklasdkjlsadjlfjlkadfljkajklsdfjklslkdfjkllkjasdjkflsdlakfjklasldfkjlasdjfkjlsadlfjklaljsafjlslkjawjklerkjljklasjkdfjklwerjljalsdjkflwerjlkwejlkarjklalkklfsdjlfhkjsdfkhsewhkjjasdjfkhwkejrkjahjefkhkasdjhfkashfkjwehfkksadfjaskfkhjdshjfhewkjhasdfjdajskfjwehkfajkankaskjdfasdjhfkkjhjjkasdfjhkjahksdf" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find the name here a little confusing. The string is ~440 bytes in length.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes this might be confusing.
just did it that way to not put way to many characters to reach the 1024.
time.Sleep(99 * time.Millisecond) | ||
} | ||
time.Sleep(100 * time.Millisecond) | ||
Expect(drain.messages).To(HaveLen(10)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general in these tests can we use Eventually
rather than relying on timing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are not final yet.
this test especially is set to test for the time window being exactly 1s, like defined in the class, and not something else.
If I reread it right now, it does not exactly test that, but an eventually would not test, if the timings defined would be adhered to.
I reimplemented the changes using a similar approach to what @ctlong proposed. |
I did some elaborate testing on the current and new approach for syslog-batching, sending from our dev cf landscape with 4 diego cells and 4 loggregator agents to a cls instance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally it looks fine, I found two little things.
I will wait on @ctlong for his review
Description
This is our proposal to implement syslog batching for sending logs via https.
it includes a switch between the normal syslog one log per request mode via a syslog query parameter.
This can be done with the query parameter
batching=true
:If you enable the syslog batching behaviour, it will currently write syslogbatches, where single messages are newline delimited (\n).
Currently, the batch sizes are hardwired to be around 256kb, which is already sufficient for speeding up throughput by a factor of 10x at least.
making it configurable would be an option, but I did not see the need so far.
please let me know what you think of the current approach.