New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Very slow process message #1334

Closed
dangmocrang opened this Issue Aug 3, 2015 · 5 comments

Comments

Projects
None yet
3 participants
@dangmocrang

dangmocrang commented Aug 3, 2015

Hi,
My server: 64G RAM (32 for elasic and 32 for OS)
I have around 3001200 msgs/s (with over 100 streams, each filter with gl2_remote_ip, no wildcard used). it has been a few days since my process buffer got increased to 100% utilized with ring_size is 65536 and the process buffer always keep at 100%, "the message been read in the last second" is very low about 20500 msgs/s.
So with this speed of message processed, my journal keep increase and now I have 23 Mil unprocess messages.
At first I use graylog version 1.1.3 and elasticsearch 1.6.0. I upgrade to 1.1.5 and elastic 1.6.1 this morning since I thought they could fix this problem but not, it still the same as previous version.
I must increase size of journal to 50GB to keep my logs. but with this rate it will be full soon.
I also tried to change processbuffer_processors to 10, 15, 20 but the problem not fix

The strange thing is when I restart graylog. before the process buffer is filled up. The output read from journal is very high up to 10,000 msgs but when it comes to 100% utilized, it slow down

What should I do now ?

@dangmocrang

This comment has been minimized.

Show comment
Hide comment
@dangmocrang

dangmocrang Aug 3, 2015

I found the problem. I remove my extractor then the processed message increase. So this may because of extractor... but with the this power machine ? some extractor can slow down it ?

dangmocrang commented Aug 3, 2015

I found the problem. I remove my extractor then the processed message increase. So this may because of extractor... but with the this power machine ? some extractor can slow down it ?

@joschi

This comment has been minimized.

Show comment
Hide comment
@joschi

joschi Aug 3, 2015

Contributor

If you're using the Regular Expression or Grok extractors, you can easily create expressions (e. g. using a lot of backtracking) which will run very slowly or even never complete at all.

Could you share the extractor in question?

Contributor

joschi commented Aug 3, 2015

If you're using the Regular Expression or Grok extractors, you can easily create expressions (e. g. using a lot of backtracking) which will run very slowly or even never complete at all.

Could you share the extractor in question?

@dangmocrang

This comment has been minimized.

Show comment
Hide comment
@dangmocrang

dangmocrang Aug 3, 2015

Oh dear... All my extractor are Grok and Regular at all. How can I deal with this ? Use other type like "split/index" or "subtring" ?
Is there anyway to speed up if I still want to use grok and regular expression?

One of my grok:

grok_pattern: %{DATA:UNWANTED} user_name\=\"%{DATA:userName;string}\" user_gp=\"%{DATA:userGroup;string}\" %{DATA:UNWANTED} category_type\=\"%{DATA:categoryType}\" %{DATA:UNWANTED} src_ip\=%{IPV4:sourceIP;string} dst_ip\=%{IPV4:destinationIP;string} protocol\=\"%{DATA:protocol;string}\" src_port\=%{BASE10NUM:sourcePort;int} dst_port\=%{BASE10NUM:destinationPort;int} 

dangmocrang commented Aug 3, 2015

Oh dear... All my extractor are Grok and Regular at all. How can I deal with this ? Use other type like "split/index" or "subtring" ?
Is there anyway to speed up if I still want to use grok and regular expression?

One of my grok:

grok_pattern: %{DATA:UNWANTED} user_name\=\"%{DATA:userName;string}\" user_gp=\"%{DATA:userGroup;string}\" %{DATA:UNWANTED} category_type\=\"%{DATA:categoryType}\" %{DATA:UNWANTED} src_ip\=%{IPV4:sourceIP;string} dst_ip\=%{IPV4:destinationIP;string} protocol\=\"%{DATA:protocol;string}\" src_port\=%{BASE10NUM:sourcePort;int} dst_port\=%{BASE10NUM:destinationPort;int} 
@joschi

This comment has been minimized.

Show comment
Hide comment
@joschi

joschi Aug 3, 2015

Contributor

Not every regular expression or grok pattern is slow, but if you add lots of those to a single input, processing speed naturally takes a hit.

I'm closing this issue now because it's neither a bug report nor a feature request. Please use our mailing list or our IRC channel #graylog on Freenode for general questions around Graylog.

Related mailing list post: https://groups.google.com/d/msg/graylog2/ywNR7CGPAi0/R4E53NsAHwAJ

Contributor

joschi commented Aug 3, 2015

Not every regular expression or grok pattern is slow, but if you add lots of those to a single input, processing speed naturally takes a hit.

I'm closing this issue now because it's neither a bug report nor a feature request. Please use our mailing list or our IRC channel #graylog on Freenode for general questions around Graylog.

Related mailing list post: https://groups.google.com/d/msg/graylog2/ywNR7CGPAi0/R4E53NsAHwAJ

@joschi joschi closed this Aug 3, 2015

@zez3

This comment has been minimized.

Show comment
Hide comment
@zez3

zez3 Oct 4, 2016

We have found that by increasing the processbuffer and outputbuffer helps

#  The number of parallel running processors.
#  Raise this number if your buffers are filling up.
processbuffer_processors = 20
outputbuffer_processors = 15

I must say that we have a lot of grok and regex patterns configured. We cannot just drop them.
We have increased x3 the buffers and saw an expected major increase in the cpu and output for the time that it still had millions in queue(journal). After this it's pretty much idle-ing and no accumulation is happening anymore.

zez3 commented Oct 4, 2016

We have found that by increasing the processbuffer and outputbuffer helps

#  The number of parallel running processors.
#  Raise this number if your buffers are filling up.
processbuffer_processors = 20
outputbuffer_processors = 15

I must say that we have a lot of grok and regex patterns configured. We cannot just drop them.
We have increased x3 the buffers and saw an expected major increase in the cpu and output for the time that it still had millions in queue(journal). After this it's pretty much idle-ing and no accumulation is happening anymore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment