New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing protocol version in error about unknown protocol version #14

Closed
hc4 opened this Issue Oct 14, 2016 · 15 comments

Comments

Projects
None yet
3 participants
@hc4
Contributor

hc4 commented Oct 14, 2016

Just upgraded to 1.1.2 and got error:

java.lang.Exception: Unknown beats protocol version: {}
        at org.graylog.plugins.beats.BeatsFrameDecoder.checkVersion(BeatsFrameDecoder.java:155) ~[?:?]

According to sources, there is missing format argument.
Also strange thing is that I've got this error only once... Maybe some network error.

@joschi

This comment has been minimized.

Contributor

joschi commented Oct 14, 2016

@hc4 Which beat are you using and in which version?

@hc4

This comment has been minimized.

Contributor

hc4 commented Oct 14, 2016

I'm using filebeat and topbeat same version on every host. But got error only once :)
So the problem sure not in beats

@joschi

This comment has been minimized.

Contributor

joschi commented Oct 14, 2016

@hc4

I'm using filebeat and topbeat same version on every host.

And the version being…?

Additionally, please attach the configuration for each beat and the configuration of your Beats input.

@hc4

This comment has been minimized.

Contributor

hc4 commented Oct 14, 2016

1.2.3

@joschi

This comment has been minimized.

Contributor

joschi commented Oct 15, 2016

@hc4 And the configuration for each beat and the configuration of your Beats input…

@hc4

This comment has been minimized.

Contributor

hc4 commented Oct 15, 2016

I'll check configs on monday.
Looked in code and have a question.
Shouldn't there be a break after line 82?

@hc4

This comment has been minimized.

Contributor

hc4 commented Oct 17, 2016

Filebeat output config:

  logstash:
    hosts: ["1.1.1.1:5044", "2.2.2.2:5044"]
    max_retries: -1
    tls:
      certificate_authorities: ["pem file"]
      insecure: false

Beats input:

{
 "title": "Beats TLS",
 "global": false,
 "name": "Beats",
 "content_pack": null,
 "created_at": "2016-10-14T09:33:24.342Z",
 "type": "org.graylog.plugins.beats.BeatsInput",
 "creator_user_id": "user id",
 "attributes": {
   "recv_buffer_size": 212992,
   "port": 5044,
   "tls_key_file": "key8 file",
   "tls_enable": true,
   "tls_key_password": "key pass",
   "tcp_keepalive": false,
   "tls_client_auth_cert_file": "",
   "tls_client_auth": "disabled",
   "override_source": "",
   "bind_address": "0.0.0.0",
   "tls_cert_file": "crt file"
 },
 "static_fields": {},
 "node": "node id",
 "id": "input id"
}

@jalogisch jalogisch added this to the 2.2.0 milestone Oct 17, 2016

@joschi

This comment has been minimized.

Contributor

joschi commented Oct 17, 2016

@hc4

Filebeat output config:

That's not the complete configuration. Please provide the complete configuration of both of your beats…

@hc4

This comment has been minimized.

Contributor

hc4 commented Oct 17, 2016

which section exactly do you need?
Configs may contain some sensitive information.
e.g. I'm not sure if you need prospectors config

@hc4

This comment has been minimized.

Contributor

hc4 commented Oct 17, 2016

filebeat:
  registry_file: registry
  config_dir: config dir
output:
  logstash:
    hosts: ["1.1.1.1:5044", "2.2.2.2:5044"]
    max_retries: -1
    tls:
      certificate_authorities: ["pem file"]
      insecure: false
shipper:
logging:
  to_files: true
  files:
    path: logs path
    rotateeverybytes: 10485760 # = 10MB
    keepfiles: 7
  selectors: ["*"]
  level: info
@hc4

This comment has been minimized.

Contributor

hc4 commented Oct 17, 2016

Problem repeats several times per day.
I think it's caused by unstable network connection (server located in China with bad internet)

@hc4

This comment has been minimized.

Contributor

hc4 commented Oct 19, 2016

just looked deeper into code.
It seems there is a bug in ReplayingDecoder usage.
Methods processWindowSizeFrame, parseDataFrame, processCompressedFrame and parseJsonFrame checks availability of data in buffer and if there is not enough data resets read index to last checkpoint (by call to channelBuffer.resetReaderIndex()).
Last checkpoint at moment, when this methods called is FRAME_TYPE.
But after processing buffer with theese methods checkpoint always changed to PROTOCOL_VERSION.
So next decode call assumes that checkpoint is PROTOCOL_VERSION, but actual read index points to FRAME_TYPE. Then first byte of frame type processed as version and error gets thrown.

@hc4

This comment has been minimized.

Contributor

hc4 commented Oct 19, 2016

Removed all buffer checks from code.
Will test for some time.
There is possible bug in my version - in case of compressed frame with incorrect data frame inside decoder will forever try to parse same broken message. But this shouldn't happen if protecol implemented correctly on both sides :)

Fixed decoder - BeatsFrameDecoder.java
If problem will not occur in next few days, I can make PR.

@hc4

This comment has been minimized.

Contributor

hc4 commented Oct 19, 2016

Just got incorrect version message again.
But now it was during cleanup after disconnect.
In ReplayingDecoder during cleanup REPLAY excepation just ignored.
So instead of breaking of reading data, BeatsFrameDecoder thniks that everything read and incorrectly sets state to PROTOCOL_VERSION.
Added empty implementation of decodeLast(). Empty, because read frame must be ACKed, but after beat disconnected it is impossible.
new version - BeatsFrameDecoder.java

@hc4

This comment has been minimized.

Contributor

hc4 commented Oct 24, 2016

Fixed deodeLast logic (method must read all data from buffer)
No PROTOCOL_VERSION errors so far (3+ days). Only "Connection timed out" (as expected).
BeatsFrameDecoder.java

hc4 added a commit to hc4/graylog-plugin-beats that referenced this issue Oct 26, 2016

Fix frame decoding in case of lost connection
Fixes Graylog2#14 and Graylog2#15
Removed readable bytes checks (it will be checked by ReplayingDecoder)
Added decodingLast implementation, that ignores all received data (we can't use it, because can't send ACK) and resets decoding state
Fixed error message for bad protocol version

@joschi joschi closed this in #17 Oct 26, 2016

joschi added a commit that referenced this issue Oct 26, 2016

joschi added a commit that referenced this issue Oct 26, 2016

Fix frame decoding in case of lost connection (#17)
Fixes #14
Fixes #15
(cherry picked from commit 8c751e0)

joschi added a commit that referenced this issue Oct 26, 2016

Fix frame decoding in case of lost connection (#17)
Fixes #14
Fixes #15
(cherry picked from commit 8c751e0)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment