Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GELF events not logged #43

Closed
Thaoden opened this issue May 8, 2019 · 25 comments
Closed

GELF events not logged #43

Thaoden opened this issue May 8, 2019 · 25 comments
Labels
bug Something isn't working

Comments

@Thaoden
Copy link

Thaoden commented May 8, 2019

I currently have problems when trying to ingest events in GELF. I have a working Seq server 5.1.3004 that runs on windows and I have installed the seq.input.gelf app as explained on this page.

However, when trying to send an event via ncat like so:

echo -n '{ "version": "1.1", "host": "example.org", "short_message": "A short message", "level": 1, "timestamp": 1557327843, "_some_info": "foo" }' | ncat -w5 -u my.log.server 12201

nothing happens, even when sent from the same machine.

When sending a wrong format, eg by setting the level to a string, at least an error message is logged:

echo -n '{ "version": "1.1", "host": "example.org", "short_message": "A short message", "level": "DEBUG", "timestamp": 1557327843, "_some_info": "foo" }' | ncat -w5 -u my.log.server 12201

{"@ t":"2019-05-08T15:18:50.6925401Z","@ mt":"GELF processing failed","@ m":"GELF processing failed","@ i":"07b8c06e","@ l":"ERROR","@ x":"invalid type: string "DEBUG", expected u8 at line 1 column 95","AppId":"hostedapp-68","AppInstanceId":"appinstance-70"}

Any hints as to where I went in the wrong direction? Thanks!

@KodrAus
Copy link
Member

KodrAus commented May 8, 2019

Hi @Thaoden 👋

By "nothing happens" do you mean the event doesn't appear in Seq?

Just giving this a quick test locally, it looks like the GELF server does at least accept this event:

$ cargo run
$ echo '{ "version": "1.1", "host": "example.org", "short_message": "A short message", "level": 1, "timestamp": 1557327843, "_some_info": "foo" }' | nc -w1 -u 127.0.0.1 12201

writes:

{"@t":"2019-05-08T15:04:03Z","@l":"alert","@m":"A short message","host":"example.org","some_info":"foo"}

to stdout.

@Thaoden
Copy link
Author

Thaoden commented May 9, 2019

Hi @KodrAus

Exactly, the event doesn't appear in Seq.

In a second terminal window, I listened to 12201 with ncat while the GELF app was not running (nc -luv 12201), this wrote the message back to the second terminal.

I enabled the diagnostics from the app instance and it seems the event is properly received and processed:

{"@ t":"2019-05-09T11:56:39.5908288Z","@ mt":"Collected GELF server metrics","@ m":"Collected GELF server metrics","@ i":"1fcf524e","@ l":"DEBUG","process":{"msg":1},"receive":{"chunk":0,"msg_chunked":0,"msg_unchunked":1,"overflow_incomplete_chunks":0},"server":{"process_err":0,"process_ok":1,"receive_err":0,"receive_ok":1},"AppId":"hostedapp-129","AppInstanceId":"appinstance-131"}

You use cargo in your answer. Do I need Rust / Cargo on my machine so that this can work?

EDIT: Apparently, this works now with version 1.0.208-dev of GELF-input.

@KodrAus
Copy link
Member

KodrAus commented May 9, 2019

Thanks @Thaoden! Asking about the new metrics was going to be my next question 🙂

Hmm, so after updating the app messages are showing up in Seq now?

@KodrAus
Copy link
Member

KodrAus commented May 9, 2019

Ah sorry, you shouldn’t need Rust to test the GELF input unless you want to clone and run this repo directly. We build self-contained binaries for nuget.org

@Thaoden
Copy link
Author

Thaoden commented May 10, 2019

Hmm, so after updating the app messages are showing up in Seq now?

Exactly. I don't know if you changed something (didn't look at the changelog yet), but it seems to be working now :)

@KodrAus
Copy link
Member

KodrAus commented May 12, 2019

Glad to hear it's working! We didn't make any changes to the server itself besides adding metrics, but if you ended up with an older version of the app before then there could've been more changes rolled in.

Is everything still good?

@Thaoden
Copy link
Author

Thaoden commented May 13, 2019

It is, at least when sending from the command line (I couldn't convince my logging lib to output GELF events yet) - but I guess it wouldn't make any difference for the log server where those events come from (command line vs logging lib).

@Thaoden Thaoden closed this as completed May 13, 2019
@Thaoden
Copy link
Author

Thaoden commented May 13, 2019

So, for some reason, the ingestion stopped. It took up again when I re-enabled the diagnostics. It was running before with diagnostics disabled.

@Thaoden Thaoden reopened this May 13, 2019
@KodrAus
Copy link
Member

KodrAus commented May 13, 2019

Ah right, this could be the same issue a few other users have reported on the support forum that we’re currently working through.

I’ve been thinking it might’ve been load related, but so far my burn in testing hasn’t replicated that behaviour.

@Thaoden
Copy link
Author

Thaoden commented May 13, 2019

I dont't produce much load on my Seq server (yet), currently, it's ~30 events every 3 minutes. Any other information I can provide to help in debugging?

@Thaoden
Copy link
Author

Thaoden commented May 13, 2019

After receiving this diagnostic event:

{"@ t":"2019-05-13T13:37:07.5171133Z","@ mt":"Collected GELF server metrics","@ m":"Collected GELF server metrics","@ i":"1fcf524e","@ l":"DEBUG","process":{"msg":637},"receive":{"chunk":1,"msg_chunked":0,"msg_unchunked":637,"overflow_incomplete_chunks":0},"server":{"process_err":0,"process_ok":637,"receive_err":0,"receive_ok":637},"AppId":"hostedapp-68","AppInstanceId":"appinstance-98"}

ingestion stopped again. So maybe it is chunk-related?

@KodrAus
Copy link
Member

KodrAus commented May 13, 2019

Thanks for the extra details @Thaoden! That looks like a promising angle to explore.

@Thaoden
Copy link
Author

Thaoden commented May 13, 2019

And another update: The first message that did not get ingested was roughly 20kb in size cough

Maybe that's a problem?

@Thaoden
Copy link
Author

Thaoden commented May 14, 2019

After dumping the UDP traffic and investigating the packets, it seems to be an issue with the logging library I use; apparently it advertises multiple chunks of a message but then only sends the first (not 100% sure about it though).

@Thaoden
Copy link
Author

Thaoden commented Jun 12, 2019

I have seen there is a new patch for the Seq server, however, even with Seq version 5.1.3118 and GELF input version 1.0.226-dev, GELF event ingestion stops and is resumed only when restarting the app.

Any hints or anything I can do to help?

@KodrAus
Copy link
Member

KodrAus commented Jun 12, 2019

Hi @Thaoden, thanks for the update.

Is it possible to give the experimental TCP support a go in your environment? You can enable it by changing your GELF address in your app instance to include the tcp:// protocol at the front. If you’re running TCP with extra diagnostics then you should see an event raised saying GELF is listening on TCP instead of UDP.

I’d be curious to know whether you see the same behaviour with TCP, and whether it manifests itself as the accept socket disappearing, or active connections going dark and timing out.

@Thaoden
Copy link
Author

Thaoden commented Jun 13, 2019

I can try the TCP support next week I think, but will let you know of the outcome.

@Thaoden
Copy link
Author

Thaoden commented Jun 26, 2019

Sorry for the delay.

I configured the app to listen on the TCP port instead of the UDP port. All messages were then logged as expected.

As written before, I seems the logging library I use advertises multiple chunks of a message but then only sends the first; if I understand correctly, TCP doesn't have a notion of chunking, so I cannot simulate this.

@KodrAus
Copy link
Member

KodrAus commented Jun 26, 2019

Thanks for the update @Thaoden! Glad to here the TCP implementation is holding up so far.

Yeh, without chunking being a factor we should be skirting around that issue in the logging library. There is another issue at the UDP transport layer where we’d lose our socket and silently fail to receive messages. It sounds like you haven’t run into that one with TCP yet either.

Please feel free to reach out if there’s any change!

@Thaoden
Copy link
Author

Thaoden commented Jun 26, 2019

@KodrAus What is the other issue with UDP? Maybe it's affecting the logging lib I use in some weird way, the maintainer cannot reproduce the chunk-advertised-but-not-sent issue I face.

@KodrAus
Copy link
Member

KodrAus commented Jun 28, 2019

That’s this issue reported on the support forum. The UDP socket gets lost somewhere and no new messages are processed. If you’re seeing this behaviour then you won’t receive any messages, chunked or individual, until the server is restarted.

@Thaoden
Copy link
Author

Thaoden commented Jun 28, 2019

Yup, I see that one too when using UDP. Last message that makes is through is the one before the advertised-but-not-sent-chunk. Though I don't need to restart the server, changing a configuration setting in the GELF ingestion app is enough for me to get it running again.

@KodrAus
Copy link
Member

KodrAus commented Sep 12, 2019

Hi @Thaoden 👋

We've just finished upgrading the async stack used in the sqelf server to Rust's new async/await syntax. That simplifies the control flow a lot and appears to have shaken out some problematic interactions with our framing in UDP in particular.

It's published now on NuGet.org and Docker Hub as version 2.0.253. If you have a chance to give it a try I'd be keen to here how you go.

@KodrAus KodrAus added the bug Something isn't working label Sep 12, 2019
@Thaoden
Copy link
Author

Thaoden commented Oct 8, 2019

Hey @KodrAus , thanks for the update!

I'll let you know once I find the time to try it!

@KodrAus
Copy link
Member

KodrAus commented Nov 28, 2019

This issue has been around for a while and a lot has changed internally in sqelf, including porting to async/await`, which appears to have fixed things, and adding TCP. So I’ll go ahead and close this one now, but we can open fresh issues for anything new that comes up.

Thanks for helping us debug this @Thaoden!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants