Added elasticsearch-http() destination #2509

pzoleex · 2019-01-23T11:59:30Z

This destination is based on the native http destination of syslog-ng
and uses elasticsearch bulk api (https://www.elastic.co/guide/en/elasticsearch/reference/6.5/docs-bulk.html)

Example:
destination d_elasticsearch_http {
elasticsearch-http(index("my_index")
type("my_type")
url("http://my_elastic_server:9200/_bulk"));
};

Signed-off-by: Zoltan Pallagi pzoleex@gmail.com

scl/elasticsearch/plugin.conf

kira-syslogng · 2019-01-23T12:22:43Z

Build SUCCESS

faxm0dem · 2019-01-23T18:36:18Z

this is great, thanks!
two questions:

are type and index templateable?
is type optional? it should be, as it is deprecated in ES 6

scl/elasticsearch/plugin.conf

kira-syslogng · 2019-01-24T08:34:21Z

Build SUCCESS

pzoleex · 2019-01-24T08:39:34Z

Hi,

Thanks for your feedback!

this is great, thanks!
two questions:

are type and index templateable?

Yes, you can use template (macros) in type and index fields, but it's your risk if the resolved template contains some not-allowed character(s). There is no protection in syslog-ng for this case, so always make sure that the resolved template can only contains characters that are allowed by elastic.

is type optional? it should be, as it is deprecated in ES 6

In this solution, you can set it to empty string ("") that will make this field to empty string but you cannot remove it completely from the index.
They will remove the type completely in ES 8 but well, even the stable 7.0.0 is not yet released so we should focus on the existing ES 5 and 6 right now.
Looking their eol table (https://www.elastic.co/support/eol), when the 7.0 is released, the 5.x will be eol, so then we can change the default index creation and remove the type (update the scl, or we can create an other scl for ES 7)

bazsi · 2019-01-24T09:00:17Z

Can't we remove the previous elasticsearch destinations in favour of this one? I mean the best would be if we only had one elasticsearch() and we would change the implementation. Or it can't be compatible? Thanks

…

On Thu, Jan 24, 2019 at 8:15 AM Kókai Péter ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In scl/elasticsearch/plugin.conf <#2509 (comment)>: > @@ -113,3 +113,27 @@ block destination elasticsearch2( `__VARARGS__` ); }; + +block destination elasticsearch-http-bulk( I would split this elasticsearch-http-bulk into a different file: scl/elasticsearch/es-http-bulk.conf. 1. As the @requires are *global* for a file, meaning when you have a @requires if that module does not present everything after it won't be parsed. But the needed modules for the two elasticsearch is different. 2. They solve similar issue (hence dir) but in a different way (different file) — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#2509 (review)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AArldgKYMI9wWFIlsGbXkaZwmfdccQWcks5vGV2WgaJpZM4aOoju> .

pzoleex · 2019-01-24T09:24:19Z

@bazsi Yes, this is my plan, but I want to do it in a different PR.
ES-1/2 are already EOL by the elastic so we can do it safely.

@faxm0dem, Bazsi:
I'm just thinking about not to use index and type in the body, but let the user to give it in the url.
It's not so nice, because the body will contain an empty index (index:{}, it is required field), and the user must give the index and type for all server instead of specifying it only once.
But this way we can prevent the api break because the user can decide if he want to use type or not.

In that case the url will be the next:
url("http://my_elastic_server:9200/my_index/my_type/_bulk http://my_elastic_server2:9200/my_index/my_type/_bulk"));

instead of
url("http://my_elastic_server:9200/_bulk
http://my_elastic_server2:9200/_bulk"));

and when the ES7 or 8 is released, he can change it to (no type)
url("http://my_elastic_server:9200/my_index/_bulk"));

What do you think about it?

faxm0dem · 2019-01-27T19:54:55Z

no, don't do that: most ppl use time-based indices, e.g. syslog-${YEAR}.${MONTH}.${DAY} so what would happen at day break with a batch containing both messages from yesterday and today?

bazsi · 2019-01-28T17:46:34Z

Hi, I am not sure I follow, surely http() is capable of doing time based indexes. Or is it he proposed SCL wrapper that misses that? Thanks for your clarification. Bazsi

…

On Sun, Jan 27, 2019 at 9:06 PM Fabien Wernli ***@***.***> wrote: no, don't do that: most ppl use time-based indices, e.g. syslog-${YEAR}.${MONTH}.${DAY} so what would happen at day break with a batch containing both messages from yesterday and today? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2509 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AArldrZTujmISkjIVPXwn_m9HeG0NZa2ks5vHgQQgaJpZM4aOoju> .

pzoleex · 2019-01-31T10:33:53Z

So the changes:
using this omit-empty-values feature (#2519), we can solve the type problem. If it contains empty string, it won't be generated for the index. On that way you can enable/disable using this function.
Also I can add the custom-id function too that is optional.

kira-syslogng · 2019-01-31T10:54:47Z

Build SUCCESS

faxm0dem · 2019-01-31T12:16:01Z

I'd love to test this before it gets merged, if that's fine by you

pzoleex · 2019-02-02T08:03:24Z

I'd love to test this before it gets merged, if that's fine by you

That would be great! Just don't forget to apply the #2519 before using this SCL.

faxm0dem · 2019-02-04T14:20:50Z

the current elastic-v2 destination allows for load-balancing.
I just realised #2433 was closed : does that mean the http destination doesn't allow multiple urls?

EDIT: sorry, I just saw that the load-balancing code is there since 3.19 so multiple urls are fine

faxm0dem · 2019-02-04T14:49:47Z

I just tried to set an url string list but syslog-ng doesn't like it:

destination d_coloss {
  elasticsearch-http(
    url("https://node1.example.com:9200 https://node2.example.com:9200")
    index("test-${YEAR}-${MONTH}-${DAY}")
    time-zone("UTC")
    type("test")
    workers(4)
    batch_lines(16)
    timeout(10)
    tls(
      ca-file("/path/to/chain-ca.pem")
      cert-file("/path/to/syslog_ng.crt")
      key-file("/path/to/syslog_ng.key")
      peer-verify(yes)
    )
  );
};

The parser doesn't seem happy about the string list:

[2019-02-04T15:43:28.335193] curl: error sending HTTP request; url='https://node1.example.com:9200 https://node2.example.com:9200', error='Couldn\'t resolve host name', worker_index='0', driver='d_coloss#0', location='#buffer:4:3'
[2019-02-04T15:43:28.335241] Load balancer target failed, removing from rotation; url='https://node1.example.com:9200 https://node2.example.com:9200'

faxm0dem · 2019-02-04T15:13:38Z

I just successfully tested using this config:

@version: 3.19
@include "scl.conf"

destination d_elastic {
	elasticsearch-http(
		url("https://node01.example.com:9200/_bulk")
		index("test-${YEAR}-${MONTH}-${DAY}")
		time-zone("UTC")
		type("test")
		workers(4)
		batch_lines(16)
		timeout(10)
		tls(
			ca-file("ca.pem")
			cert-file("syslog_ng.crt.pem")
			key-file("syslog_ng.key.pem")
			peer-verify(yes)
		)
	);
};

log {
	source {
		stdin();
	};
	destination(d_elastic);
};

However, batch_lines doesn't seem to be honored:

[2019-02-04T16:10:59.034661] Incoming log entry; line='fdssdf'
[2019-02-04T16:10:59.099908] curl: HTTP response received; url='https://node01.example.com:9200/_bulk', status_code='200', body_size='208', batch_size='1', redirected='0', total_time='0.048', worker_index='0', driver='d_elastic#0', location='#buffer:4:3'
sfsd
[2019-02-04T16:11:01.792067] Incoming log entry; line='sfsd'
[2019-02-04T16:11:01.866980] curl: HTTP response received; url='https://node01.example.com:9200/_bulk', status_code='200', body_size='206', batch_size='1', redirected='0', total_time='0.059', worker_index='1', driver='d_elastic#0', location='#buffer:4:3'

This destination is based on the native http destination of syslog-ng and uses elasticsearch bulk api (https://www.elastic.co/guide/en/elasticsearch/reference/6.5/docs-bulk.html) Example: destination d_elasticsearch_http { elasticsearch-http(index("my_index") type("my_type") url("http://my_elastic_server:9200/_bulk")); }; Signed-off-by: Zoltan Pallagi <pzoleex@gmail.com>

pzoleex · 2019-02-04T15:15:04Z

PR updated.
use that form (and don't forget the use the proper end point, _bulk):
url("https://node1.example.com:9200/_bulk" "https://node2.example.com:9200/_bulk")

pzoleex · 2019-02-04T15:20:55Z

However, batch_lines doesn't seem to be honored:

[2019-02-04T16:10:59.034661] Incoming log entry; line='fdssdf'
[2019-02-04T16:10:59.099908] curl: HTTP response received; url='https://node01.example.com:9200/_bulk', status_code='200', body_size='208', batch_size='1', redirected='0', total_time='0.048', worker_index='0', driver='d_elastic#0', location='#buffer:4:3'
sfsd
[2019-02-04T16:11:01.792067] Incoming log entry; line='sfsd'
[2019-02-04T16:11:01.866980] curl: HTTP response received; url='https://node01.example.com:9200/_bulk', status_code='200', body_size='206', batch_size='1', redirected='0', total_time='0.059', worker_index='1', driver='d_elastic#0', location='#buffer:4:3'

Send messages faster :)
It will auto-flush if the messages come slow (otherwise it would wait forever if for example batch_lines=16 and only 2 messages arrived).

If you want to flush exactly at batch_lines(), then set batch_timeout() parameter.
For example if you set:
batch_lines(16)
batch_timeout(60)
then it will only flush when there are 16 lines in the batch, or the batch_timeout expired

faxm0dem · 2019-02-04T15:24:39Z

I tried batch_timeout, doesn't seem to do anything

kira-syslogng · 2019-02-04T15:38:15Z

Build SUCCESS

pzoleex · 2019-02-05T08:46:51Z

I tried batch_timeout, doesn't seem to do anything

Sorry, the value for batch-timeout() is in millisec. So set it to 60000 for 60 seconds.

faxm0dem · 2019-02-06T08:29:23Z

indeed, it works perfectly, thanks!

I agree with @bazsi that there should be only one elastic destination.
However, it's safer to first obsolete the other destinations in 3.20
So I suggest you add warning messages to all other elastic destinations, that they're obsolete and will be removed in 3.21. Then we'll "promote" this one by renaming it to elasticsearch, which will make sense as I'll have tested it in production ;)

bazsi · 2019-02-06T08:40:46Z

any numbers you can share? CPU usage, eps etc would be very interesting. Thanks

…

On Wed, Feb 6, 2019 at 9:31 AM Fabien Wernli ***@***.***> wrote: indeed, it works perfectly, thanks! I agree with @bazsi <https://github.com/bazsi> that there should be only one elastic destination. However, it's safer to first obsolete the other destinations in 3.20 So I suggest you add warning messages to all other elastic destinations, that they're obsolete and will be removed in 3.21. Then we'll "promote" this one by renaming it to elasticsearch, which will make sense as I'll have tested it in production ;) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2509 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AArldmkL9eFVJH2YVlf01yygSjGpWXRmks5vKpJkgaJpZM4aOoju> .

faxm0dem · 2019-02-06T11:39:29Z

I'll test it using the production workload ASAP and report back here

faxm0dem · 2019-02-13T09:03:14Z

@bazsi I managed to test this in production (12 fairly recent PowerEdge machines with 10k spinning SATA disks running Elasticsearch, 1 large VM with 8VCPU running syslog-ng and loggen).

The highest throughput I could get was 30'000 messages per second:

loggen

$ loggen -r 50000 -I 120 -P -S 127.0.0.1 514 --active-connections=8
...
average rate = 31637.14 msg/sec, count=3846907, time=121.595, (average) msg size=260, bandwidth=8032.87 kB/sec

The VM's load was pretty high during the longest tests: between 0,9 and 1.1 normalized.
The Elasticsearch nodes' load was between 0.1 and 0.4 normalized.
The syslog-ng queue contained no more than 4000 messages and was very low.

syslog-ng config

source s_local_tcp {
    network(
        transport(
            tcp
        ),
        port(
            514
        ),
        ip(
            127.0.0.1
        ),
        flags(
            syslog-protocol
        )
    );
};

elasticsearch-http(
  workers(12)
  batch_lines(1024)
  batch_timeout(10000)
  timeout(10)
  index("test-syslog_ng-elastic-http")
  url("https://node221.example.com:9200/_bulk" "https://node222.example.com:9200/_bulk" "https://node223.example.com:9200/_bulk" "https://node05.example.com:9200/_bulk" "https://node08.example.com:9200/_bulk" "https://node27.example.com:9200/_bulk" "https://node53.example.com:9200/_bulk" "https://node54.example.com:9200/_bulk" "https://node55.example.com:9200/_bulk" "https://node83.example.com:9200/_bulk" "https://node84.example.com:9200/_bulk" "https://node85.example.com:9200/_bulk")
  template("$(format-json -s all-nv-pairs -x __* -x tmp.* -x SOURCE -x PROGRAM -x MESSAGE -x PID -x HOST_FROM -x HOST -x LEGACY_MSGHDR -p uniqid=$UNIQID --rekey timestamp --add-prefix @ --rekey .classifier.* --add-prefix pdb --rekey .SDATA.auto.* --shift 12 --rekey .SDATA.* --shift 7 --rekey .* --shift 1)")
  time-zone("UTC")
  type("syslog")
  tls (
    ca-file('/etc/elasticsearch/coloss/ca.pem')
    cert-file('/etc/syslog-ng/coloss-analyzer.crt')
    key-file('/etc/syslog-ng/coloss-analyzer.key')
    peer-verify(yes)
  )
  disk-buffer(reliable(no) dir("/var/lib/syslog-ng-disq/") disk-buf-size(53687091200) mem-buf-length(200
  `__VARARGS__`
);

elastic

Index settings:

# GET /test-syslog_ng-elastic-http/_settings
{
  "test-syslog_ng-elastic-http" : {
    "settings" : {
      "index" : {
        "creation_date" : "1550046604958",
        "number_of_shards" : "12",
        "number_of_replicas" : "0",
        "uuid" : "mJQlJQerRhmUuEvzgc-flw",
        "version" : {
          "created" : "6030299"
        },
        "provided_name" : "test-syslog_ng-elastic-http"
      }
    }
  }
}

As you can see the index spans all 12 nodes:

# GET /_cat/shards | grep test-syslog_ng
test-syslog_ng-elastic-http      8  p STARTED   320103  151.6mb 10.0.238.222 node222
test-syslog_ng-elastic-http      4  p STARTED   320175  151.5mb 10.0.105.59  node08
test-syslog_ng-elastic-http      2  p STARTED   320333  151.2mb 10.0.104.70  node53
test-syslog_ng-elastic-http      9  p STARTED   320612  151.5mb 10.0.108.89  node84
test-syslog_ng-elastic-http      6  p STARTED   320635  151.4mb 10.0.238.221 node221
test-syslog_ng-elastic-http      3  p STARTED   320948  151.6mb 10.0.108.2   node55
test-syslog_ng-elastic-http      11 p STARTED   320125  151.5mb 10.0.108.93  node85
test-syslog_ng-elastic-http      7  p STARTED   321130  151.4mb 10.0.108.88  node83
test-syslog_ng-elastic-http      1  p STARTED   321431  151.6mb 10.0.105.97  node27
test-syslog_ng-elastic-http      10 p STARTED   320441  151.4mb 10.0.238.223 node223
test-syslog_ng-elastic-http      5  p STARTED   320062  151.2mb 10.0.108.1   node54
test-syslog_ng-elastic-http      0  p STARTED   320912  151.4mb 10.0.104.140 node05

Moreover, all documents were written, no lost message:

# GET /test-syslog_ng-elastic-http/_count
{"count":3846907,"_shards":{"total":12,"successful":12,"skipped":0,"failed":0}}

It seems the ES cluster isn't saturated, but the syslog-ng node is.
I'm guessing this is due to the VM's size, and TLS overhead.

Still, 30keps seems pretty decent, and that's 10 times more than we need in production.

bazsi · 2019-02-13T09:27:45Z

Thanks a lot for these numbers. Really appreciated. I am somewhat disappointed by these numbers though. I think the primary reasons are: * your $(format-json) command line is pretty complex, unfortunately value-pairs have performance issues, which I wanted to fix for a time now. The command line you are using are a very nice realistic use-case to improve performance on. Still you are running this on 8 cores, so if this would be the bottleneck then the per-thread performance is roughly 3500-4000 msg/sec. Which is really disappointing. * disk buffer also has an impact, but your config partitions the traffic into 12 queues, thus the disk buffer performance issues are probably not the bottleneck. disk-wise the 30MB/sec does not seem too demanding. * TLS: shouldn't be an issue, 30MB/sec is not something a recent CPU wouldn't be able to handle. So, with all that said, I am pretty sure this can and should be improved in the future and your config example will help me guide the performance tuning. Thanks Bazsi

…

On Wed, Feb 13, 2019 at 10:05 AM Fabien Wernli ***@***.***> wrote: @bazsi <https://github.com/bazsi> I managed to test this in production (12 fairly recent PowerEdge machines with 10k spinning SATA disks running Elasticsearch, 1 large VM with 8VCPU running syslog-ng and loggen). The highest throughput I could get was 30'000 messages per second: loggen $ loggen -r 50000 -I 120 -P -S 127.0.0.1 514 --active-connections=8 ... average rate = 31637.14 msg/sec, count=3846907, time=121.595, (average) msg size=260, bandwidth=8032.87 kB/sec The VM's load was pretty high during the longest tests: between 0,9 and 1.1 normalized. The Elasticsearch nodes' load was between 0.1 and 0.4 normalized. The syslog-ng queue contained no more than 4000 messages and was very low. syslog-ng config elasticsearch-http( workers(12) batch_lines(1024) batch_timeout(10000) timeout(10) index("test-syslog_ng-elastic-http") url("https://node221.example.com:9200/_bulk" "https://node222.example.com:9200/_bulk" "https://node223.example.com:9200/_bulk" "https://node05.example.com:9200/_bulk" "https://node08.example.com:9200/_bulk" "https://node27.example.com:9200/_bulk" "https://node53.example.com:9200/_bulk" "https://node54.example.com:9200/_bulk" "https://node55.example.com:9200/_bulk" "https://node83.example.com:9200/_bulk" "https://node84.example.com:9200/_bulk" "https://node85.example.com:9200/_bulk") template("$(format-json -s all-nv-pairs -x __* -x tmp.* -x SOURCE -x PROGRAM -x MESSAGE -x PID -x HOST_FROM -x HOST -x LEGACY_MSGHDR -p uniqid=$UNIQID --rekey timestamp --add-prefix @ --rekey .classifier.* --add-prefix pdb --rekey .SDATA.auto.* --shift 12 --rekey .SDATA.* --shift 7 --rekey .* --shift 1)") time-zone("UTC") type("syslog") tls ( ca-file('/etc/elasticsearch/coloss/ca.pem') cert-file('/etc/syslog-ng/coloss-analyzer.crt') key-file('/etc/syslog-ng/coloss-analyzer.key') peer-verify(yes) ) disk-buffer(reliable(no) dir("/var/lib/syslog-ng-disq/") disk-buf-size(53687091200) mem-buf-length(200 `__VARARGS__` ); elastic Index settings: # GET /test-syslog_ng-elastic-http/_settings { "test-syslog_ng-elastic-http" : { "settings" : { "index" : { "creation_date" : "1550046604958", "number_of_shards" : "12", "number_of_replicas" : "0", "uuid" : "mJQlJQerRhmUuEvzgc-flw", "version" : { "created" : "6030299" }, "provided_name" : "test-syslog_ng-elastic-http" } } } } As you can see the index spans all 12 nodes: # GET /_cat/shards | grep test-syslog_ng test-syslog_ng-elastic-http 8 p STARTED 320103 151.6mb 10.0.238.222 node222 test-syslog_ng-elastic-http 4 p STARTED 320175 151.5mb 10.0.105.59 node08 test-syslog_ng-elastic-http 2 p STARTED 320333 151.2mb 10.0.104.70 node53 test-syslog_ng-elastic-http 9 p STARTED 320612 151.5mb 10.0.108.89 node84 test-syslog_ng-elastic-http 6 p STARTED 320635 151.4mb 10.0.238.221 node221 test-syslog_ng-elastic-http 3 p STARTED 320948 151.6mb 10.0.108.2 node55 test-syslog_ng-elastic-http 11 p STARTED 320125 151.5mb 10.0.108.93 node85 test-syslog_ng-elastic-http 7 p STARTED 321130 151.4mb 10.0.108.88 node83 test-syslog_ng-elastic-http 1 p STARTED 321431 151.6mb 10.0.105.97 node27 test-syslog_ng-elastic-http 10 p STARTED 320441 151.4mb 10.0.238.223 node223 test-syslog_ng-elastic-http 5 p STARTED 320062 151.2mb 10.0.108.1 node54 test-syslog_ng-elastic-http 0 p STARTED 320912 151.4mb 10.0.104.140 node05 Moreover, all documents were written, no lost message: # GET /test-syslog_ng-elastic-http/_count {"count":3846907,"_shards":{"total":12,"successful":12,"skipped":0,"failed":0}} It seems the ES cluster isn't saturated, but the syslog-ng node is. I'm guessing this is due to the VM's size, and TLS overhead. Still, 30keps seems pretty decent, and that's 10 times more than we need in production. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2509 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AArldu7BE3PVySgr4DAaJ1LMKrqBkc9Pks5vM9TTgaJpZM4aOoju> .

faxm0dem · 2019-02-13T09:31:45Z

I did an additional test: simulate the failure of an url, and this had the effect to end up with duplicated documents in Elasticsearch : the difference between send messages and indexed messages was a multiple of the batch size.

faxm0dem · 2019-02-13T09:57:25Z

@bazsi I replaced the template with format-json -s all-nv-pairs and removed the disk-buffer.
The performance is more or less the same.
I'll try with a larger maybe physical node if needed.

EDIT: there was a syntax error in my config. The correct conclusion is: there is little influence on the performance when simplifying the template, but disabling the unreliable disk-buffer significantly increases performance: I can easily get 48k/s

pzoleex · 2019-02-13T11:15:07Z

@faxm0dem Fabien, thanks for the tests!
can you share the network source configuration as well?
workers(12)
batch_lines(1024)
batch_timeout(10000)
I assume you are using flow-control.

With this configuration, the log-iw-size() of network source should be 12 * 1024 * number of active tcp conn (8), so log-iw-size(98304 or higher) is the optimal value.

This is because batching has a side-effect when batch_timeout() is set, it won't flush until batch_lines() or batch_timeout() reached, but every message in the batch will decrease the log_iw of the source.
That practically means if batch_lines(1024) and for example log_iw is just 100, than after 100 incoming message, the source will be suspended and the destination will only flush when the timeout elapsed because it never reach the 1024 messages in the batch.

And another tipp: you could try it without using syslog protocol, currently there is some issue related to the performance when syslog protocol is used.

faxm0dem · 2019-02-13T11:34:42Z

I updated my comment to add the network source options. I didn't set any log_iw or log_iw_size explicitly, but I don't know the default values.

@pzoleex what do you suggest to replace syslog() with?

faxm0dem · 2019-02-13T12:01:42Z

I managed to get 57k/s by just simplifying my log statement (removed many parsers, rewrites).
Setting log_iw_size to what @pzoleex suggested didn't improve anything

bazsi · 2019-02-13T16:40:20Z

The bottleneck seems to be the elasticsearch cluster then. Maybe it throttles incoming HTTP requests somehow? Can you increase the number of workers to a multiple of 12? You have 12 nodes, so increasing the threads to a multiple of that means that we would send requests to each node from multiple threads. Also, I guess elastic nodes would need to replicate the indexes somehow, so the best would be if our routing to nodes would be the same as how shards are split in ES. Do you know if there's any best practice how we should target nodes in an ES cluster for better data locality?

…

On Wed, Feb 13, 2019 at 10:57 AM Fabien Wernli ***@***.***> wrote: @bazsi <https://github.com/bazsi> I replaced the template with format-json -s all-nv-pairs and removed the disk-buffer. The performance is more or less the same. I'll try with a larger maybe physical node if needed. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2509 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AArldj_OuZIdZ2Pz_X7B_QviHhvsbJGFks5vM-GFgaJpZM4aOoju> .

bazsi · 2019-02-13T16:50:18Z

Hm... first of all I wrote this before seeing PZolee's answer. This is what I meant on routing: https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-routing-field.html This means that the document _id will be used for routing, e.g. that will determine where the document gets indexed. We should be able to route to the proper node accordingly. On Wed, Feb 13, 2019 at 5:40 PM Scheidler, Balázs < balazs.scheidler@oneidentity.com> wrote:

…

The bottleneck seems to be the elasticsearch cluster then. Maybe it throttles incoming HTTP requests somehow? Can you increase the number of workers to a multiple of 12? You have 12 nodes, so increasing the threads to a multiple of that means that we would send requests to each node from multiple threads. Also, I guess elastic nodes would need to replicate the indexes somehow, so the best would be if our routing to nodes would be the same as how shards are split in ES. Do you know if there's any best practice how we should target nodes in an ES cluster for better data locality? On Wed, Feb 13, 2019 at 10:57 AM Fabien Wernli ***@***.***> wrote: > @bazsi <https://github.com/bazsi> I replaced the template with format-json > -s all-nv-pairs and removed the disk-buffer. > The performance is more or less the same. > I'll try with a larger maybe physical node if needed. > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <#2509 (comment)>, > or mute the thread > <https://github.com/notifications/unsubscribe-auth/AArldj_OuZIdZ2Pz_X7B_QviHhvsbJGFks5vM-GFgaJpZM4aOoju> > . >

faxm0dem · 2019-02-13T21:10:25Z

the way I understand ES, if number_of_shards is a multiple of the number of data nodes, and if we leave ES compute its own _id everything should be balanced equally on all nodes. I have detailed monitoring and during the tests all nodes seemed to ingest the same workload more or less. I did see a 10% systematic difference between one group of nodes and the other, which I'll investigate later but I think it's due to our production workload which is being tiered.

nbari · 2019-03-07T21:56:49Z

in what version elasticsearch-http is available? currently trying 3.20 on FreeBSD but getting this error:

Error parsing destination statement, destination plugin elasticsearch-http not found

It is only available by using curl or java & java_mod required?

MrAnno · 2019-03-07T22:01:23Z

@nbari Hopefully, it will be included in next release (version 3.21).
Until then, the elasticsearch2() Java driver can be used in http mode.

elasticsearch-http() will be completely native, curl-based.

pzoleex · 2019-03-11T10:25:35Z

wip flag removed because #2519 already merged

pzoleex · 2019-03-11T10:26:03Z

@kira-syslogng test this please;

kira-syslogng · 2019-03-11T10:49:57Z

Build SUCCESS

alltilla reviewed Jan 23, 2019

View reviewed changes

scl/elasticsearch/plugin.conf Outdated Show resolved Hide resolved

alltilla mentioned this pull request Jan 23, 2019

segfault in 3.18.1+ latest master #2466

Closed

Kokan reviewed Jan 24, 2019

View reviewed changes

scl/elasticsearch/plugin.conf Outdated Show resolved Hide resolved

pzoleex force-pushed the pzolee-elasticsearch-http branch from a23cb07 to 010109d Compare January 24, 2019 08:11

pzoleex changed the title ~~Added elasticsearch-http-bulk() destination~~ WIP: Added elasticsearch-http-bulk() destination Jan 30, 2019

pzoleex force-pushed the pzolee-elasticsearch-http branch from 010109d to d576968 Compare January 31, 2019 10:30

pzoleex changed the title ~~WIP: Added elasticsearch-http-bulk() destination~~ WIP: Added elasticsearch-http() destination Jan 31, 2019

pzoleex force-pushed the pzolee-elasticsearch-http branch from d576968 to 57c3688 Compare January 31, 2019 10:31

Kokan added the user-visible-feature label Feb 4, 2019

pzoleex force-pushed the pzolee-elasticsearch-http branch from 57c3688 to 381ceb1 Compare February 4, 2019 15:14

pzoleex changed the title ~~WIP: Added elasticsearch-http() destination~~ Added elasticsearch-http() destination Mar 11, 2019

Kokan added this to the syslog-ng-3.21 milestone Mar 11, 2019

Kokan approved these changes Mar 12, 2019

View reviewed changes

lbudai approved these changes Mar 12, 2019

View reviewed changes

lbudai merged commit 3e9f5d7 into syslog-ng:master Mar 12, 2019

pzoleex mentioned this pull request Mar 14, 2019

elasticsearch2: Added deprecation warning #2628

Merged

gaborznagy mentioned this pull request May 9, 2019

Forward Logs from syslog-ng to ELK #2718

Closed

Added elasticsearch-http() destination #2509

Added elasticsearch-http() destination #2509

Conversation

pzoleex commented Jan 23, 2019 • edited Loading

kira-syslogng commented Jan 23, 2019

faxm0dem commented Jan 23, 2019

kira-syslogng commented Jan 24, 2019

pzoleex commented Jan 24, 2019

bazsi commented Jan 24, 2019 via email

pzoleex commented Jan 24, 2019

faxm0dem commented Jan 27, 2019

bazsi commented Jan 28, 2019 via email

pzoleex commented Jan 31, 2019

kira-syslogng commented Jan 31, 2019

faxm0dem commented Jan 31, 2019

pzoleex commented Feb 2, 2019

faxm0dem commented Feb 4, 2019 • edited Loading

faxm0dem commented Feb 4, 2019

faxm0dem commented Feb 4, 2019

pzoleex commented Feb 4, 2019

pzoleex commented Feb 4, 2019

faxm0dem commented Feb 4, 2019

kira-syslogng commented Feb 4, 2019

pzoleex commented Feb 5, 2019

faxm0dem commented Feb 6, 2019

bazsi commented Feb 6, 2019 via email

faxm0dem commented Feb 6, 2019

faxm0dem commented Feb 13, 2019 • edited Loading

loggen

syslog-ng config

elastic

bazsi commented Feb 13, 2019 via email

faxm0dem commented Feb 13, 2019

faxm0dem commented Feb 13, 2019 • edited Loading

pzoleex commented Feb 13, 2019 • edited Loading

faxm0dem commented Feb 13, 2019 • edited Loading

faxm0dem commented Feb 13, 2019

bazsi commented Feb 13, 2019 via email

bazsi commented Feb 13, 2019 via email

faxm0dem commented Feb 13, 2019

nbari commented Mar 7, 2019 • edited Loading

MrAnno commented Mar 7, 2019 • edited Loading

pzoleex commented Mar 11, 2019

pzoleex commented Mar 11, 2019

kira-syslogng commented Mar 11, 2019

pzoleex commented Jan 23, 2019 •

edited

Loading

faxm0dem commented Feb 4, 2019 •

edited

Loading

faxm0dem commented Feb 13, 2019 •

edited

Loading

faxm0dem commented Feb 13, 2019 •

edited

Loading

pzoleex commented Feb 13, 2019 •

edited

Loading

faxm0dem commented Feb 13, 2019 •

edited

Loading

nbari commented Mar 7, 2019 •

edited

Loading

MrAnno commented Mar 7, 2019 •

edited

Loading