Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change Write([]byte) to log full message to message field #11

Open
0cjs opened this issue Jul 19, 2017 · 7 comments
Open

Change Write([]byte) to log full message to message field #11

0cjs opened this issue Jul 19, 2017 · 7 comments

Comments

@0cjs
Copy link

0cjs commented Jul 19, 2017

[This is in part a continuation of a discussion started in PR #9 where we were also worried about message length.]

The message field (sent as short_message in GELF) and the full_message field are stored in the exact same way in Elasticsearch, so presumably they support the same sizes. I've actually tested message only up to 80 KB (via a null-delimited "RAW/Plaintext" interface) and 2500 lines or so, however. :-) It works fine, and the Graylog UI seems to do a reasonable job of dealing with it. (I have seen a complaint that long fields don't work, but that appears to be because they changed the default configuration to do term analysis on the message field, which isn't done by default.)

Having looked into this a bit, I find:

  1. full_message is optional to send in GELF, unlike message.

  2. full_message is not parsed for things like sources and timestamps.

  3. On its syslog inputs, Graylog has a store_full_message: true option which stores the raw syslog protocol message (usually RFC 5424 in the full_message field. This seems useful for debugging and handling situations where bad messages come in that the parser can't handle. We turned this on when we set up Graylog because it seemed like a good idea at the time, and it still doesn't strike me as a bad thing to do.

  4. Graylog search functionality seems to be oriented mainly towards message and it seems to take a bit of extra work to search full_message instead.

From all this, I get the sense that full_message is considered something that can be safely ignored by clients retrieving and processing messages from Graylog. Given that, I think it's a bad idea, when given a message via the io.Writer interface, not to put the entire message into the message field.

As background, let me tell you about my use of Graylog at work. We have dozens of servers generating log messages into the Systemd Journal that we forward to a central Graylog installation so we can easily query across servers, do centralized analysis (such as with OSSEC LIDS log analysis), keep historical records, and so on.

In order to keep things as simple as possible, I'm aiming to get all our non-journal logging also going into the journal, including logs from programs that write directly to /var/log such as auditd, output of jobs run by cron, and so on. (For example, I'm intending to replace /usr/sbin/sendmail on those servers with a program that eats the message and logs it to the journal.)

One of the questions I had to ask myself is whether I should be using the full_message field, and what with various programs partially duplicating information across message and full_message, that seems like a bad idea. How would the OSSEC log scanner, for example, know that for some log entries full_message is all the information in message and a few other fields in unparsed form, yet for other messages message could be an incomplete log message and full_message contains the complete message but without other log entry information such as the timestamp? That way lies a lot of pain, especially if I end up having to assemble new versions of log entries for log-scanning software that expects things to be re-assembled in a syslog-ish format with the timestamp, source etc. all in one big string.

So for these reasons, I'm suggesting we change the io.Writer API, Write([]byte), of go-gelf to record the entire bytestring it's handed in the message field and not send a full_message field. I think it's reasonable for it to continue adding things like the extra fields for file and line number of the caller.

Note that if anybody does want to use the full_message field for something, or any other fields for that matter, we have the Write(*Message) interface for that which allows (or should allow) full control over the GELF message being sent, within the limitations of the protocol. (We should not allow, e.g., a GELF message to be sent without a message field since that field is required by the protocol.)

If we're agreed on this, I'm happy to submit a PR for changes to the code and update the documentation appropriately, including this rationale in an appropriate place (probably the commit message).

@mariussturm
Copy link

Makes sense to me! The short_message field is mandatory while full_message is optional.
So the best solution would be to allow the user to decide which part of the message should go into which field. But because of the nature of this library of implementing io.Writer we can't provide additional flexibility easily. We could think of some way of providing a split function to the writer. But in the end using only the message field should be what most people expect.

@0cjs
Copy link
Author

0cjs commented Jul 19, 2017

Oh, I wasn't clear in the original post; we also have the Write(*Message) interface that allows clients to set up whatever arbitrary GELF messages they like, so there's no problem if they want to do something in particular. (We should check it to make sure it gives maximum flexibility, of course; I'll try to compare it with the spec. tomorrow.) Write([]byte) is just a convenience method, really, for those who want to just log some sort of message and not be bothered with the details of all the different ways they could do it.

I'll update the issue above with this information.

@ghislainbourgeois
Copy link

I would suggest at the same time updating the documentation of the GELF payload to reflect this understanding that the short_message field really means message. It would be helpful if it was made clear that the short_message field in GELF maps to the message field in Graylog.

@mariussturm
Copy link

Please use the branch v2 as base for upcoming API changes!

@0cjs
Copy link
Author

0cjs commented Aug 4, 2017

I take it you've decided not to bring in this change for v2, but delay it until v3?

I'm not clear on how using v2 as the base works. If you accept an API change and merge it on to master, and someone else has another change for the new version, if they base it off of v2 they won't be testing the other changes already on master.

My suggestion is this:

  1. Have master and v2 point to the same commit for the moment.
  2. Delay, for a little bit, bringing in any API-breaking changes.
  3. Bring in non-API-breaking changes (e.g., some README fixes I have lined up) on both master and v2 by having both refs move forward on the same commits together.
  4. When we have API-breaking changes to bring in, bring them in on master and live with having to do each change we want to bring into v2 as well a second time on v2.

@mariussturm
Copy link

v2 is master currently, so if there is no other work going on it stays the same. When there are API changes and refactorings we will branch v3 of of it. Just that you take the merged TCP feature into account when you do something there.

@0cjs
Copy link
Author

0cjs commented Aug 4, 2017

So for non-API-changing commits, I should be submitting PRs to v2 and you'll put them on master as well? You want commits that change the API also submitted for v2 and you'll take care of redirecting them appropriately?

That's fine, but I'd suggest you document this somewhere obvious from the main page for this repo, because people are unlikely to notice it buried deep in this issue.

valentin-krasontovitsch pushed a commit to valentin-krasontovitsch/go-gelf that referenced this issue Apr 17, 2018
Following the principle of least surprise, we write messages completely into
thethe `short_message` field, even if they contain several lines.

As a consequence, we deprecate the `full_message` field, as it becomes
redundant.

Resolves Graylog2#11
valentin-krasontovitsch pushed a commit to valentin-krasontovitsch/go-gelf that referenced this issue Apr 17, 2018
Following the principle of least surprise, we write messages completely into
thethe `short_message` field, even if they contain several lines.

As a consequence, we deprecate the `full_message` field, as it becomes
redundant.

Resolves Graylog2#11
valentin-krasontovitsch added a commit to valentin-krasontovitsch/go-gelf that referenced this issue Dec 4, 2019
Following the principle of least surprise, we write messages completely
into the `short_message` field, even if they contain several lines.

To not break gelf standard, we also write the full message, always, to
the `full_message` field.

Resolves Graylog2#11
valentin-krasontovitsch added a commit to valentin-krasontovitsch/go-gelf that referenced this issue Dec 4, 2019
Following the principle of least surprise, we write messages completely
into the `short_message` field, even if they contain several lines.

To not break gelf standard, we also write the full message, always, to
the `full_message` field.

Resolves Graylog2#11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants