New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Syslog TCP RST send by Graylog #1105

Closed
lboue opened this Issue Apr 10, 2015 · 17 comments

Comments

Projects
None yet
3 participants
@lboue

lboue commented Apr 10, 2015

Hello,

I send syslogs message from a firewall over TCP to Graylog. I received those messages well but after a while, Graylog closes the TCP session with a RST packet. I do not understand this behavior

Could'you help me please ?

Here is versions I use:

rpm -qa | grep graylog
graylog-1.0-repository-el6-1.2.0-1.noarch
graylog-web-1.0.1-1.noarch
graylog-server-1.0.1-1.noarch

Kernel 2.6.32-358.el6.x86_64

And Input in Cluster

Syslog TCP 1514 (Syslog TCP) running
Network IO:  16,9kiB  0B (total:  13,0GiB  0B )
Total connections: 11512 (2 active)
recv_buffer_size: 1048576
port: 1514
tls_key_file:
tls_key_password: *******
max_message_size: 2097152
override_source:
allow_override_date: true
bind_address: 0.0.0.0
tls_cert_file:

graylog syslog

Regards,

@lboue

This comment has been minimized.

lboue commented Apr 27, 2015

Hello,

Does anyone can help me?

@bernd bernd self-assigned this Jul 29, 2015

@bernd bernd added the bug label Jul 29, 2015

@bernd

This comment has been minimized.

Member

bernd commented Jul 29, 2015

Another report from the mailing list that seems to be related:

I'm using syslog-ng to feed in data via a syslog/TCP channel and it's continually (every 10 seconds) dropping the TCP channel - forcing syslog-ng to restart it

2015-07-29T02:26:31+00:00 syslog.server syslog notice syslog-ng[30512]: Syslog connection broken; fd='408', server='AF_INET(192.168.6.3:1514)', time_reopen='10'
2015-07-29T02:26:41+00:00 syslog.server syslog notice syslog-ng[30512]: Syslog connection established; fd='465', server='AF_INET(192.168.6.3:1514)', local='AF_INET(0.0.0.0:0)'
2015-07-29T02:26:41+00:00 syslog.server syslog notice syslog-ng[30512]: Syslog connection broken; fd='465', server='AF_INET(192.168.6.3:1514)', time_reopen='10'
2015-07-29T02:26:51+00:00 syslog.server syslog notice syslog-ng[30512]: Syslog connection established; fd='379', server='AF_INET(192.168.6.3:1514)', local='AF_INET(0.0.0.0:0)'
2015-07-29T02:26:51+00:00 syslog.server syslog notice syslog-ng[30512]: Syslog connection broken; fd='379', server='AF_INET(192.168.6.3:1514)', time_reopen='10'
2015-07-29T02:27:01+00:00 syslog.server syslog notice syslog-ng[30512]: Syslog connection established; fd='476', server='AF_INET(192.168.6.3:1514)', local='AF_INET(0.0.0.0:0)'
2015-07-29T02:27:02+00:00 syslog.server syslog notice syslog-ng[30512]: Syslog connection broken; fd='476', server='AF_INET(192.168.6.3:1514)', time_reopen='10'

tcpdump shows normal data flow followed by two TCP resets coming back from the graylog-1.1.5 server - so it's definitely graylog that's borking.

BTW, this system is working: I'm seeing these syslogs flowing in - can do searches/etc - but I assume I'm losing some records due to this issue. I even created a xinetd.d based tcp service on the graylog server that just logged what it received to a file, configured the syslog server to send to both tcp channels - and it's running fine with no restarts (ie tcpdump of both ports only shows TCP resets on the graylog port not the xinetd port). So I think that implies it isn't the OS (CentOS-7)

Source: https://groups.google.com/forum/?hl=en#!topic/graylog2/YQ0MYvdp4G0

@jhaar

This comment has been minimized.

jhaar commented Jul 29, 2015

My syslog-ng.conf contains the following

destination d_graylog {
        tcp("192.168.66.3" port(9876));
        syslog("192.168.66.3" port(1514));
};
log { source(s_local);  destination(d_graylog);};

I have the tcp channel aimed at an xinetd service that just writes the content to disk - it doesn't crash/TCP-reset. The "syslog" channel goes to the TCP syslog Input channel in graylog and shows the TCP-reset issue

The reason I'm using "syslog" is because graylog didn't like the "tcp" option - the content just blackholes all together. Graylog documentation about syslog-ng referred to using "syslog" (which I didn't even know existed as an option in syslog-ng) and that works - except for the TCP-reset issue

ngrep shows the traffic looking as follows - you can see the difference between the "tcp" (9876) and "syslog" (1514) channel formats

T 192.168.66.3.13:59786 -> 192.168.66.3.13:9876 [AP] <164>Jul 29 01:54:50 192.168.66.3.1 %ASA-4-106023: Deny udp src inside:192.168.66.3.185/6185 dst outside:192.168.66.3.111/3478 by access-group "acl_in" [0x0, 0x0].
T 192.168.66.3.13:43365 -> 192.168.66.3.13:1514 [AP] 168 
T 192.168.66.3.13:43365 -> 192.168.66.3.13:1514 [AP] <164>1 2015-07-29T01:54:50+00:00 192.168.66.3.1 %ASA-4-106023 - - - Deny udp src inside:192.168.66.3.185/6185 dst outside:192.168.66.3.111/3478 by access-group "acl_in" [0x0, 0x0].111 <22>1 2015-07-29T01:54:50+00:00 mailsrv .........
@bernd

This comment has been minimized.

Member

bernd commented Jul 29, 2015

@jhaar Awesome, thank you for the configs.

@bernd

This comment has been minimized.

Member

bernd commented Jul 29, 2015

@jhaar How many messages per second are you sending to that syslog input?

@jhaar

This comment has been minimized.

jhaar commented Jul 29, 2015

The syslog server sees bursts up to a 1000/sec - but with such repeated startup/shutdown of the graylog syslog input channel, I don't think it would reach those actual peaks (but it could reach 90% of course)

@jhaar

This comment has been minimized.

jhaar commented Jul 31, 2015

FYI I just replaced syslog-ng with rsyslog and still have the problem. It works - but is continually experiencing tcp-resets - once every few seconds

@bernd

This comment has been minimized.

Member

bernd commented Jul 31, 2015

@jhaar Thanks for the update. I saw the issue as well but wasn't able to easily reproduce it yet. I only saw it every few minutes. I will let you know once I have more information.

@jhaar

This comment has been minimized.

jhaar commented Aug 4, 2015

FYI I just moved off using syslog/TCP to syslog/UDP and immediately saw graylog messages/sec leap for a factor of 4 times!! We were losing 75% of all TCP syslogs due to this issue

Thankfully the central syslog server is in the same rack as the graylog server - so using UDP is OK - but that certainly wouldn't be possible over WANs/etc

@lboue

This comment has been minimized.

lboue commented Aug 4, 2015

In my case I had to setup rsyslog between syslog sender and graylog as a workaround

firewall ==>  rsyslog (syslog TCP/514) ==> graylog (syslog TCP/1514)

I using syslog/TCP to ensure message integrity and reading the last post I understand that I still could lose messages. I see bursts up to a 250 messages/second

@bernd

This comment has been minimized.

Member

bernd commented Aug 4, 2015

FYI I just moved off using syslog/TCP to syslog/UDP and immediately saw graylog messages/sec leap for a factor of 4 times!! We were losing 75% of all TCP syslogs due to this issue

That's really bad. Thanks for the update again. I am on it.

@lboue

This comment has been minimized.

lboue commented Aug 4, 2015

Could TCP keepalive solve this issue if it avoids too many open TCP socket
Add TCP keepalive configuration option for TCP transport

bernd added a commit that referenced this issue Aug 4, 2015

Fix edge case in SyslogOctetCountFrameDecoder
We have to take the skipped bytes (frame size value length + whitespace)
into account when checking if the buffer has enough data to read the
complete message.

This fixes an exception when reading from the buffer that mostly
happened during high load situations.

Fixes #1105
@bernd

This comment has been minimized.

Member

bernd commented Aug 4, 2015

@jhaar @lboue I found the problem in our Syslog handling.

The TCP RST was sent because Graylog closed the connection after an exception has been thrown. This just wasn't logged because we were missing a exception logger for the Netty pipeline. (will be fixed in #1340)

See #1339 for a fix. This will be released in Graylog 1.2.0.

Thank you very much for all the debugging information!

@joschi joschi closed this in #1339 Aug 5, 2015

joschi added a commit that referenced this issue Aug 5, 2015

Merge pull request #1339 from Graylog2/issue-1105
Fix edge case in SyslogOctetCountFrameDecoder

Fixes #1105

bernd added a commit that referenced this issue Aug 5, 2015

Fix edge case in SyslogOctetCountFrameDecoder
We have to take the skipped bytes (frame size value length + whitespace)
into account when checking if the buffer has enough data to read the
complete message.

This fixes an exception when reading from the buffer that mostly
happened during high load situations.

Fixes #1105

(cherry picked from commit b20ffab)
@bernd

This comment has been minimized.

Member

bernd commented Aug 6, 2015

@jhaar @lboue We just released Graylog 1.1.6 which contains the fix! https://www.graylog.org/graylog-1-1-6-released/

@lboue

This comment has been minimized.

lboue commented Aug 7, 2015

Thanks a lot for this fix ! I will try it soon

@jhaar

This comment has been minimized.

jhaar commented Aug 12, 2015

looking good. No more TCP resets :-)

@bernd

This comment has been minimized.

Member

bernd commented Aug 12, 2015

looking good. No more TCP resets :-)

Great! Thanks for the feedback!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment