Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added elasticsearch-http() destination #2509

Merged
merged 1 commit into from
Mar 12, 2019

Conversation

pzoleex
Copy link
Collaborator

@pzoleex pzoleex commented Jan 23, 2019

This destination is based on the native http destination of syslog-ng
and uses elasticsearch bulk api (https://www.elastic.co/guide/en/elasticsearch/reference/6.5/docs-bulk.html)

Example:
destination d_elasticsearch_http {
elasticsearch-http(index("my_index")
type("my_type")
url("http://my_elastic_server:9200/_bulk"));
};

Signed-off-by: Zoltan Pallagi pzoleex@gmail.com

@kira-syslogng
Copy link
Contributor

Build SUCCESS

@faxm0dem
Copy link
Contributor

this is great, thanks!
two questions:

  1. are type and index templateable?
  2. is type optional? it should be, as it is deprecated in ES 6

@kira-syslogng
Copy link
Contributor

Build SUCCESS

@pzoleex
Copy link
Collaborator Author

pzoleex commented Jan 24, 2019

Hi,

Thanks for your feedback!

this is great, thanks!
two questions:

  1. are type and index templateable?

Yes, you can use template (macros) in type and index fields, but it's your risk if the resolved template contains some not-allowed character(s). There is no protection in syslog-ng for this case, so always make sure that the resolved template can only contains characters that are allowed by elastic.

  1. is type optional? it should be, as it is deprecated in ES 6

In this solution, you can set it to empty string ("") that will make this field to empty string but you cannot remove it completely from the index.
They will remove the type completely in ES 8 but well, even the stable 7.0.0 is not yet released so we should focus on the existing ES 5 and 6 right now.
Looking their eol table (https://www.elastic.co/support/eol), when the 7.0 is released, the 5.x will be eol, so then we can change the default index creation and remove the type (update the scl, or we can create an other scl for ES 7)

@bazsi
Copy link
Collaborator

bazsi commented Jan 24, 2019 via email

@pzoleex
Copy link
Collaborator Author

pzoleex commented Jan 24, 2019

@bazsi Yes, this is my plan, but I want to do it in a different PR.
ES-1/2 are already EOL by the elastic so we can do it safely.

@faxm0dem, Bazsi:
I'm just thinking about not to use index and type in the body, but let the user to give it in the url.
It's not so nice, because the body will contain an empty index (index:{}, it is required field), and the user must give the index and type for all server instead of specifying it only once.
But this way we can prevent the api break because the user can decide if he want to use type or not.

In that case the url will be the next:
url("http://my_elastic_server:9200/my_index/my_type/_bulk http://my_elastic_server2:9200/my_index/my_type/_bulk"));

instead of
url("http://my_elastic_server:9200/_bulk
http://my_elastic_server2:9200/_bulk"));

and when the ES7 or 8 is released, he can change it to (no type)
url("http://my_elastic_server:9200/my_index/_bulk"));

What do you think about it?

@faxm0dem
Copy link
Contributor

no, don't do that: most ppl use time-based indices, e.g. syslog-${YEAR}.${MONTH}.${DAY} so what would happen at day break with a batch containing both messages from yesterday and today?

@bazsi
Copy link
Collaborator

bazsi commented Jan 28, 2019 via email

@pzoleex pzoleex changed the title Added elasticsearch-http-bulk() destination WIP: Added elasticsearch-http-bulk() destination Jan 30, 2019
@pzoleex pzoleex changed the title WIP: Added elasticsearch-http-bulk() destination WIP: Added elasticsearch-http() destination Jan 31, 2019
@pzoleex
Copy link
Collaborator Author

pzoleex commented Jan 31, 2019

So the changes:
using this omit-empty-values feature (#2519), we can solve the type problem. If it contains empty string, it won't be generated for the index. On that way you can enable/disable using this function.
Also I can add the custom-id function too that is optional.

@kira-syslogng
Copy link
Contributor

Build SUCCESS

@faxm0dem
Copy link
Contributor

I'd love to test this before it gets merged, if that's fine by you

@pzoleex
Copy link
Collaborator Author

pzoleex commented Feb 2, 2019

I'd love to test this before it gets merged, if that's fine by you

That would be great! Just don't forget to apply the #2519 before using this SCL.

@faxm0dem
Copy link
Contributor

faxm0dem commented Feb 4, 2019

the current elastic-v2 destination allows for load-balancing.
I just realised #2433 was closed : does that mean the http destination doesn't allow multiple urls?

EDIT: sorry, I just saw that the load-balancing code is there since 3.19 so multiple urls are fine

@faxm0dem
Copy link
Contributor

faxm0dem commented Feb 4, 2019

I just tried to set an url string list but syslog-ng doesn't like it:

destination d_coloss {
  elasticsearch-http(
    url("https://node1.example.com:9200 https://node2.example.com:9200")
    index("test-${YEAR}-${MONTH}-${DAY}")
    time-zone("UTC")
    type("test")
    workers(4)
    batch_lines(16)
    timeout(10)
    tls(
      ca-file("/path/to/chain-ca.pem")
      cert-file("/path/to/syslog_ng.crt")
      key-file("/path/to/syslog_ng.key")
      peer-verify(yes)
    )
  );
};

The parser doesn't seem happy about the string list:

[2019-02-04T15:43:28.335193] curl: error sending HTTP request; url='https://node1.example.com:9200 https://node2.example.com:9200', error='Couldn\'t resolve host name', worker_index='0', driver='d_coloss#0', location='#buffer:4:3'
[2019-02-04T15:43:28.335241] Load balancer target failed, removing from rotation; url='https://node1.example.com:9200 https://node2.example.com:9200'

@faxm0dem
Copy link
Contributor

faxm0dem commented Feb 4, 2019

I just successfully tested using this config:

@version: 3.19
@include "scl.conf"

destination d_elastic {
	elasticsearch-http(
		url("https://node01.example.com:9200/_bulk")
		index("test-${YEAR}-${MONTH}-${DAY}")
		time-zone("UTC")
		type("test")
		workers(4)
		batch_lines(16)
		timeout(10)
		tls(
			ca-file("ca.pem")
			cert-file("syslog_ng.crt.pem")
			key-file("syslog_ng.key.pem")
			peer-verify(yes)
		)
	);
};

log {
	source {
		stdin();
	};
	destination(d_elastic);
};

However, batch_lines doesn't seem to be honored:

[2019-02-04T16:10:59.034661] Incoming log entry; line='fdssdf'
[2019-02-04T16:10:59.099908] curl: HTTP response received; url='https://node01.example.com:9200/_bulk', status_code='200', body_size='208', batch_size='1', redirected='0', total_time='0.048', worker_index='0', driver='d_elastic#0', location='#buffer:4:3'
sfsd
[2019-02-04T16:11:01.792067] Incoming log entry; line='sfsd'
[2019-02-04T16:11:01.866980] curl: HTTP response received; url='https://node01.example.com:9200/_bulk', status_code='200', body_size='206', batch_size='1', redirected='0', total_time='0.059', worker_index='1', driver='d_elastic#0', location='#buffer:4:3'

This destination is based on the native http destination of syslog-ng
and uses elasticsearch bulk api (https://www.elastic.co/guide/en/elasticsearch/reference/6.5/docs-bulk.html)

Example:
destination d_elasticsearch_http {
    elasticsearch-http(index("my_index")
 type("my_type")
 url("http://my_elastic_server:9200/_bulk"));
};

Signed-off-by: Zoltan Pallagi <pzoleex@gmail.com>
@pzoleex
Copy link
Collaborator Author

pzoleex commented Feb 4, 2019

PR updated.
use that form (and don't forget the use the proper end point, _bulk):
url("https://node1.example.com:9200/_bulk" "https://node2.example.com:9200/_bulk")

@pzoleex
Copy link
Collaborator Author

pzoleex commented Feb 4, 2019

However, batch_lines doesn't seem to be honored:

[2019-02-04T16:10:59.034661] Incoming log entry; line='fdssdf'
[2019-02-04T16:10:59.099908] curl: HTTP response received; url='https://node01.example.com:9200/_bulk', status_code='200', body_size='208', batch_size='1', redirected='0', total_time='0.048', worker_index='0', driver='d_elastic#0', location='#buffer:4:3'
sfsd
[2019-02-04T16:11:01.792067] Incoming log entry; line='sfsd'
[2019-02-04T16:11:01.866980] curl: HTTP response received; url='https://node01.example.com:9200/_bulk', status_code='200', body_size='206', batch_size='1', redirected='0', total_time='0.059', worker_index='1', driver='d_elastic#0', location='#buffer:4:3'

Send messages faster :)
It will auto-flush if the messages come slow (otherwise it would wait forever if for example batch_lines=16 and only 2 messages arrived).

If you want to flush exactly at batch_lines(), then set batch_timeout() parameter.
For example if you set:
batch_lines(16)
batch_timeout(60)
then it will only flush when there are 16 lines in the batch, or the batch_timeout expired

@faxm0dem
Copy link
Contributor

faxm0dem commented Feb 4, 2019

I tried batch_timeout, doesn't seem to do anything

@kira-syslogng
Copy link
Contributor

Build SUCCESS

@pzoleex
Copy link
Collaborator Author

pzoleex commented Feb 5, 2019

I tried batch_timeout, doesn't seem to do anything

Sorry, the value for batch-timeout() is in millisec. So set it to 60000 for 60 seconds.

@faxm0dem
Copy link
Contributor

faxm0dem commented Feb 6, 2019

indeed, it works perfectly, thanks!

I agree with @bazsi that there should be only one elastic destination.
However, it's safer to first obsolete the other destinations in 3.20
So I suggest you add warning messages to all other elastic destinations, that they're obsolete and will be removed in 3.21. Then we'll "promote" this one by renaming it to elasticsearch, which will make sense as I'll have tested it in production ;)

@bazsi
Copy link
Collaborator

bazsi commented Feb 6, 2019 via email

@faxm0dem
Copy link
Contributor

faxm0dem commented Feb 6, 2019

I'll test it using the production workload ASAP and report back here

@faxm0dem
Copy link
Contributor

faxm0dem commented Feb 13, 2019

@bazsi I managed to test this in production (12 fairly recent PowerEdge machines with 10k spinning SATA disks running Elasticsearch, 1 large VM with 8VCPU running syslog-ng and loggen).

The highest throughput I could get was 30'000 messages per second:

loggen

$ loggen -r 50000 -I 120 -P -S 127.0.0.1 514 --active-connections=8
...
average rate = 31637.14 msg/sec, count=3846907, time=121.595, (average) msg size=260, bandwidth=8032.87 kB/sec

The VM's load was pretty high during the longest tests: between 0,9 and 1.1 normalized.
The Elasticsearch nodes' load was between 0.1 and 0.4 normalized.
The syslog-ng queue contained no more than 4000 messages and was very low.

syslog-ng config

source s_local_tcp {
    network(
        transport(
            tcp
        ),
        port(
            514
        ),
        ip(
            127.0.0.1
        ),
        flags(
            syslog-protocol
        )
    );
};

elasticsearch-http(
  workers(12)
  batch_lines(1024)
  batch_timeout(10000)
  timeout(10)
  index("test-syslog_ng-elastic-http")
  url("https://node221.example.com:9200/_bulk" "https://node222.example.com:9200/_bulk" "https://node223.example.com:9200/_bulk" "https://node05.example.com:9200/_bulk" "https://node08.example.com:9200/_bulk" "https://node27.example.com:9200/_bulk" "https://node53.example.com:9200/_bulk" "https://node54.example.com:9200/_bulk" "https://node55.example.com:9200/_bulk" "https://node83.example.com:9200/_bulk" "https://node84.example.com:9200/_bulk" "https://node85.example.com:9200/_bulk")
  template("$(format-json -s all-nv-pairs -x __* -x tmp.* -x SOURCE -x PROGRAM -x MESSAGE -x PID -x HOST_FROM -x HOST -x LEGACY_MSGHDR -p uniqid=$UNIQID --rekey timestamp --add-prefix @ --rekey .classifier.* --add-prefix pdb --rekey .SDATA.auto.* --shift 12 --rekey .SDATA.* --shift 7 --rekey .* --shift 1)")
  time-zone("UTC")
  type("syslog")
  tls (
    ca-file('/etc/elasticsearch/coloss/ca.pem')
    cert-file('/etc/syslog-ng/coloss-analyzer.crt')
    key-file('/etc/syslog-ng/coloss-analyzer.key')
    peer-verify(yes)
  )
  disk-buffer(reliable(no) dir("/var/lib/syslog-ng-disq/") disk-buf-size(53687091200) mem-buf-length(200
  `__VARARGS__`
);

elastic

Index settings:

# GET /test-syslog_ng-elastic-http/_settings
{
  "test-syslog_ng-elastic-http" : {
    "settings" : {
      "index" : {
        "creation_date" : "1550046604958",
        "number_of_shards" : "12",
        "number_of_replicas" : "0",
        "uuid" : "mJQlJQerRhmUuEvzgc-flw",
        "version" : {
          "created" : "6030299"
        },
        "provided_name" : "test-syslog_ng-elastic-http"
      }
    }
  }
}

As you can see the index spans all 12 nodes:

# GET /_cat/shards | grep test-syslog_ng
test-syslog_ng-elastic-http      8  p STARTED   320103  151.6mb 10.0.238.222 node222
test-syslog_ng-elastic-http      4  p STARTED   320175  151.5mb 10.0.105.59  node08
test-syslog_ng-elastic-http      2  p STARTED   320333  151.2mb 10.0.104.70  node53
test-syslog_ng-elastic-http      9  p STARTED   320612  151.5mb 10.0.108.89  node84
test-syslog_ng-elastic-http      6  p STARTED   320635  151.4mb 10.0.238.221 node221
test-syslog_ng-elastic-http      3  p STARTED   320948  151.6mb 10.0.108.2   node55
test-syslog_ng-elastic-http      11 p STARTED   320125  151.5mb 10.0.108.93  node85
test-syslog_ng-elastic-http      7  p STARTED   321130  151.4mb 10.0.108.88  node83
test-syslog_ng-elastic-http      1  p STARTED   321431  151.6mb 10.0.105.97  node27
test-syslog_ng-elastic-http      10 p STARTED   320441  151.4mb 10.0.238.223 node223
test-syslog_ng-elastic-http      5  p STARTED   320062  151.2mb 10.0.108.1   node54
test-syslog_ng-elastic-http      0  p STARTED   320912  151.4mb 10.0.104.140 node05

Moreover, all documents were written, no lost message:

# GET /test-syslog_ng-elastic-http/_count
{"count":3846907,"_shards":{"total":12,"successful":12,"skipped":0,"failed":0}}

It seems the ES cluster isn't saturated, but the syslog-ng node is.
I'm guessing this is due to the VM's size, and TLS overhead.

Still, 30keps seems pretty decent, and that's 10 times more than we need in production.

@bazsi
Copy link
Collaborator

bazsi commented Feb 13, 2019 via email

@faxm0dem
Copy link
Contributor

I did an additional test: simulate the failure of an url, and this had the effect to end up with duplicated documents in Elasticsearch : the difference between send messages and indexed messages was a multiple of the batch size.

@faxm0dem
Copy link
Contributor

faxm0dem commented Feb 13, 2019

@bazsi I replaced the template with format-json -s all-nv-pairs and removed the disk-buffer.
The performance is more or less the same.
I'll try with a larger maybe physical node if needed.

EDIT: there was a syntax error in my config. The correct conclusion is: there is little influence on the performance when simplifying the template, but disabling the unreliable disk-buffer significantly increases performance: I can easily get 48k/s

@pzoleex
Copy link
Collaborator Author

pzoleex commented Feb 13, 2019

@faxm0dem Fabien, thanks for the tests!
can you share the network source configuration as well?
workers(12)
batch_lines(1024)
batch_timeout(10000)
I assume you are using flow-control.

With this configuration, the log-iw-size() of network source should be 12 * 1024 * number of active tcp conn (8), so log-iw-size(98304 or higher) is the optimal value.

This is because batching has a side-effect when batch_timeout() is set, it won't flush until batch_lines() or batch_timeout() reached, but every message in the batch will decrease the log_iw of the source.
That practically means if batch_lines(1024) and for example log_iw is just 100, than after 100 incoming message, the source will be suspended and the destination will only flush when the timeout elapsed because it never reach the 1024 messages in the batch.

And another tipp: you could try it without using syslog protocol, currently there is some issue related to the performance when syslog protocol is used.

@faxm0dem
Copy link
Contributor

faxm0dem commented Feb 13, 2019

I updated my comment to add the network source options. I didn't set any log_iw or log_iw_size explicitly, but I don't know the default values.

@pzoleex what do you suggest to replace syslog() with?

@faxm0dem
Copy link
Contributor

I managed to get 57k/s by just simplifying my log statement (removed many parsers, rewrites).
Setting log_iw_size to what @pzoleex suggested didn't improve anything

@bazsi
Copy link
Collaborator

bazsi commented Feb 13, 2019 via email

@bazsi
Copy link
Collaborator

bazsi commented Feb 13, 2019 via email

@faxm0dem
Copy link
Contributor

the way I understand ES, if number_of_shards is a multiple of the number of data nodes, and if we leave ES compute its own _id everything should be balanced equally on all nodes. I have detailed monitoring and during the tests all nodes seemed to ingest the same workload more or less. I did see a 10% systematic difference between one group of nodes and the other, which I'll investigate later but I think it's due to our production workload which is being tiered.

@nbari
Copy link

nbari commented Mar 7, 2019

in what version elasticsearch-http is available? currently trying 3.20 on FreeBSD but getting this error:

Error parsing destination statement, destination plugin elasticsearch-http not found

It is only available by using curl or java & java_mod required?

@MrAnno
Copy link
Collaborator

MrAnno commented Mar 7, 2019

@nbari Hopefully, it will be included in next release (version 3.21).
Until then, the elasticsearch2() Java driver can be used in http mode.

elasticsearch-http() will be completely native, curl-based.

@pzoleex pzoleex changed the title WIP: Added elasticsearch-http() destination Added elasticsearch-http() destination Mar 11, 2019
@pzoleex
Copy link
Collaborator Author

pzoleex commented Mar 11, 2019

wip flag removed because #2519 already merged

@pzoleex
Copy link
Collaborator Author

pzoleex commented Mar 11, 2019

@kira-syslogng test this please;

@kira-syslogng
Copy link
Contributor

Build SUCCESS

@Kokan Kokan added this to the syslog-ng-3.21 milestone Mar 11, 2019
@lbudai lbudai merged commit 3e9f5d7 into syslog-ng:master Mar 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants