Change `Write([]byte)` to log full message to `message` field #11

0cjs · 2017-07-19T10:59:50Z

[This is in part a continuation of a discussion started in PR #9 where we were also worried about message length.]

The message field (sent as short_message in GELF) and the full_message field are stored in the exact same way in Elasticsearch, so presumably they support the same sizes. I've actually tested message only up to 80 KB (via a null-delimited "RAW/Plaintext" interface) and 2500 lines or so, however. :-) It works fine, and the Graylog UI seems to do a reasonable job of dealing with it. (I have seen a complaint that long fields don't work, but that appears to be because they changed the default configuration to do term analysis on the message field, which isn't done by default.)

Having looked into this a bit, I find:

full_message is optional to send in GELF, unlike message.
full_message is not parsed for things like sources and timestamps.
On its syslog inputs, Graylog has a store_full_message: true option which stores the raw syslog protocol message (usually RFC 5424 in the full_message field. This seems useful for debugging and handling situations where bad messages come in that the parser can't handle. We turned this on when we set up Graylog because it seemed like a good idea at the time, and it still doesn't strike me as a bad thing to do.
Graylog search functionality seems to be oriented mainly towards message and it seems to take a bit of extra work to search full_message instead.

From all this, I get the sense that full_message is considered something that can be safely ignored by clients retrieving and processing messages from Graylog. Given that, I think it's a bad idea, when given a message via the io.Writer interface, not to put the entire message into the message field.

As background, let me tell you about my use of Graylog at work. We have dozens of servers generating log messages into the Systemd Journal that we forward to a central Graylog installation so we can easily query across servers, do centralized analysis (such as with OSSEC LIDS log analysis), keep historical records, and so on.

In order to keep things as simple as possible, I'm aiming to get all our non-journal logging also going into the journal, including logs from programs that write directly to /var/log such as auditd, output of jobs run by cron, and so on. (For example, I'm intending to replace /usr/sbin/sendmail on those servers with a program that eats the message and logs it to the journal.)

One of the questions I had to ask myself is whether I should be using the full_message field, and what with various programs partially duplicating information across message and full_message, that seems like a bad idea. How would the OSSEC log scanner, for example, know that for some log entries full_message is all the information in message and a few other fields in unparsed form, yet for other messages message could be an incomplete log message and full_message contains the complete message but without other log entry information such as the timestamp? That way lies a lot of pain, especially if I end up having to assemble new versions of log entries for log-scanning software that expects things to be re-assembled in a syslog-ish format with the timestamp, source etc. all in one big string.

So for these reasons, I'm suggesting we change the io.Writer API, Write([]byte), of go-gelf to record the entire bytestring it's handed in the message field and not send a full_message field. I think it's reasonable for it to continue adding things like the extra fields for file and line number of the caller.

Note that if anybody does want to use the full_message field for something, or any other fields for that matter, we have the Write(*Message) interface for that which allows (or should allow) full control over the GELF message being sent, within the limitations of the protocol. (We should not allow, e.g., a GELF message to be sent without a message field since that field is required by the protocol.)

If we're agreed on this, I'm happy to submit a PR for changes to the code and update the documentation appropriately, including this rationale in an appropriate place (probably the commit message).

The text was updated successfully, but these errors were encountered:

mariussturm · 2017-07-19T12:47:22Z

Makes sense to me! The short_message field is mandatory while full_message is optional.
So the best solution would be to allow the user to decide which part of the message should go into which field. But because of the nature of this library of implementing io.Writer we can't provide additional flexibility easily. We could think of some way of providing a split function to the writer. But in the end using only the message field should be what most people expect.

0cjs · 2017-07-19T12:52:22Z

Oh, I wasn't clear in the original post; we also have the Write(*Message) interface that allows clients to set up whatever arbitrary GELF messages they like, so there's no problem if they want to do something in particular. (We should check it to make sure it gives maximum flexibility, of course; I'll try to compare it with the spec. tomorrow.) Write([]byte) is just a convenience method, really, for those who want to just log some sort of message and not be bothered with the details of all the different ways they could do it.

I'll update the issue above with this information.

ghislainbourgeois · 2017-07-19T13:47:40Z

I would suggest at the same time updating the documentation of the GELF payload to reflect this understanding that the short_message field really means message. It would be helpful if it was made clear that the short_message field in GELF maps to the message field in Graylog.

mariussturm · 2017-08-02T12:40:45Z

Please use the branch v2 as base for upcoming API changes!

0cjs · 2017-08-04T02:13:21Z

I take it you've decided not to bring in this change for v2, but delay it until v3?

I'm not clear on how using v2 as the base works. If you accept an API change and merge it on to master, and someone else has another change for the new version, if they base it off of v2 they won't be testing the other changes already on master.

My suggestion is this:

Have master and v2 point to the same commit for the moment.
Delay, for a little bit, bringing in any API-breaking changes.
Bring in non-API-breaking changes (e.g., some README fixes I have lined up) on both master and v2 by having both refs move forward on the same commits together.
When we have API-breaking changes to bring in, bring them in on master and live with having to do each change we want to bring into v2 as well a second time on v2.

mariussturm · 2017-08-04T08:18:05Z

v2 is master currently, so if there is no other work going on it stays the same. When there are API changes and refactorings we will branch v3 of of it. Just that you take the merged TCP feature into account when you do something there.

0cjs · 2017-08-04T08:42:26Z

So for non-API-changing commits, I should be submitting PRs to v2 and you'll put them on master as well? You want commits that change the API also submitted for v2 and you'll take care of redirecting them appropriately?

That's fine, but I'd suggest you document this somewhere obvious from the main page for this repo, because people are unlikely to notice it buried deep in this issue.

Following the principle of least surprise, we write messages completely into thethe `short_message` field, even if they contain several lines. As a consequence, we deprecate the `full_message` field, as it becomes redundant. Resolves Graylog2#11

Following the principle of least surprise, we write messages completely into the `short_message` field, even if they contain several lines. To not break gelf standard, we also write the full message, always, to the `full_message` field. Resolves Graylog2#11

0cjs mentioned this issue Jul 19, 2017

Add TCP support for writing logs #9

Merged

valentin-krasontovitsch mentioned this issue Apr 17, 2018

Feat: Write full message into short_message field #24

Closed

valentin-krasontovitsch mentioned this issue Dec 4, 2019

Feat: Write full message into short_message field #34

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change `Write([]byte)` to log full message to `message` field #11

Change `Write([]byte)` to log full message to `message` field #11

0cjs commented Jul 19, 2017 •

edited

Loading

mariussturm commented Jul 19, 2017

0cjs commented Jul 19, 2017

ghislainbourgeois commented Jul 19, 2017

mariussturm commented Aug 2, 2017

0cjs commented Aug 4, 2017

mariussturm commented Aug 4, 2017

0cjs commented Aug 4, 2017

Change Write([]byte) to log full message to message field #11

Change Write([]byte) to log full message to message field #11

Comments

0cjs commented Jul 19, 2017 • edited Loading

mariussturm commented Jul 19, 2017

0cjs commented Jul 19, 2017

ghislainbourgeois commented Jul 19, 2017

mariussturm commented Aug 2, 2017

0cjs commented Aug 4, 2017

mariussturm commented Aug 4, 2017

0cjs commented Aug 4, 2017

Change `Write([]byte)` to log full message to `message` field #11

Change `Write([]byte)` to log full message to `message` field #11

0cjs commented Jul 19, 2017 •

edited

Loading