Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing protocol version in error about unknown protocol version #14

Closed
hc4 opened this issue Oct 14, 2016 · 15 comments
Closed

Missing protocol version in error about unknown protocol version #14

hc4 opened this issue Oct 14, 2016 · 15 comments
Labels
Milestone

Comments

@hc4
Copy link
Contributor

@hc4 hc4 commented Oct 14, 2016

Just upgraded to 1.1.2 and got error:

java.lang.Exception: Unknown beats protocol version: {}
        at org.graylog.plugins.beats.BeatsFrameDecoder.checkVersion(BeatsFrameDecoder.java:155) ~[?:?]

According to sources, there is missing format argument.
Also strange thing is that I've got this error only once... Maybe some network error.

@joschi
Copy link
Contributor

@joschi joschi commented Oct 14, 2016

@hc4 Which beat are you using and in which version?

@hc4
Copy link
Contributor Author

@hc4 hc4 commented Oct 14, 2016

I'm using filebeat and topbeat same version on every host. But got error only once :)
So the problem sure not in beats

@joschi
Copy link
Contributor

@joschi joschi commented Oct 14, 2016

@hc4

I'm using filebeat and topbeat same version on every host.

And the version being…?

Additionally, please attach the configuration for each beat and the configuration of your Beats input.

@hc4
Copy link
Contributor Author

@hc4 hc4 commented Oct 14, 2016

1.2.3

@joschi
Copy link
Contributor

@joschi joschi commented Oct 15, 2016

@hc4 And the configuration for each beat and the configuration of your Beats input…

@hc4
Copy link
Contributor Author

@hc4 hc4 commented Oct 15, 2016

I'll check configs on monday.
Looked in code and have a question.
Shouldn't there be a break after line 82?

@hc4
Copy link
Contributor Author

@hc4 hc4 commented Oct 17, 2016

Filebeat output config:

  logstash:
    hosts: ["1.1.1.1:5044", "2.2.2.2:5044"]
    max_retries: -1
    tls:
      certificate_authorities: ["pem file"]
      insecure: false

Beats input:

{
 "title": "Beats TLS",
 "global": false,
 "name": "Beats",
 "content_pack": null,
 "created_at": "2016-10-14T09:33:24.342Z",
 "type": "org.graylog.plugins.beats.BeatsInput",
 "creator_user_id": "user id",
 "attributes": {
   "recv_buffer_size": 212992,
   "port": 5044,
   "tls_key_file": "key8 file",
   "tls_enable": true,
   "tls_key_password": "key pass",
   "tcp_keepalive": false,
   "tls_client_auth_cert_file": "",
   "tls_client_auth": "disabled",
   "override_source": "",
   "bind_address": "0.0.0.0",
   "tls_cert_file": "crt file"
 },
 "static_fields": {},
 "node": "node id",
 "id": "input id"
}
@jalogisch jalogisch added this to the 2.2.0 milestone Oct 17, 2016
@joschi
Copy link
Contributor

@joschi joschi commented Oct 17, 2016

@hc4

Filebeat output config:

That's not the complete configuration. Please provide the complete configuration of both of your beats…

@hc4
Copy link
Contributor Author

@hc4 hc4 commented Oct 17, 2016

which section exactly do you need?
Configs may contain some sensitive information.
e.g. I'm not sure if you need prospectors config

@hc4
Copy link
Contributor Author

@hc4 hc4 commented Oct 17, 2016

filebeat:
  registry_file: registry
  config_dir: config dir
output:
  logstash:
    hosts: ["1.1.1.1:5044", "2.2.2.2:5044"]
    max_retries: -1
    tls:
      certificate_authorities: ["pem file"]
      insecure: false
shipper:
logging:
  to_files: true
  files:
    path: logs path
    rotateeverybytes: 10485760 # = 10MB
    keepfiles: 7
  selectors: ["*"]
  level: info
@hc4
Copy link
Contributor Author

@hc4 hc4 commented Oct 17, 2016

Problem repeats several times per day.
I think it's caused by unstable network connection (server located in China with bad internet)

@hc4
Copy link
Contributor Author

@hc4 hc4 commented Oct 19, 2016

just looked deeper into code.
It seems there is a bug in ReplayingDecoder usage.
Methods processWindowSizeFrame, parseDataFrame, processCompressedFrame and parseJsonFrame checks availability of data in buffer and if there is not enough data resets read index to last checkpoint (by call to channelBuffer.resetReaderIndex()).
Last checkpoint at moment, when this methods called is FRAME_TYPE.
But after processing buffer with theese methods checkpoint always changed to PROTOCOL_VERSION.
So next decode call assumes that checkpoint is PROTOCOL_VERSION, but actual read index points to FRAME_TYPE. Then first byte of frame type processed as version and error gets thrown.

@hc4
Copy link
Contributor Author

@hc4 hc4 commented Oct 19, 2016

Removed all buffer checks from code.
Will test for some time.
There is possible bug in my version - in case of compressed frame with incorrect data frame inside decoder will forever try to parse same broken message. But this shouldn't happen if protecol implemented correctly on both sides :)

Fixed decoder - BeatsFrameDecoder.java
If problem will not occur in next few days, I can make PR.

@hc4
Copy link
Contributor Author

@hc4 hc4 commented Oct 19, 2016

Just got incorrect version message again.
But now it was during cleanup after disconnect.
In ReplayingDecoder during cleanup REPLAY excepation just ignored.
So instead of breaking of reading data, BeatsFrameDecoder thniks that everything read and incorrectly sets state to PROTOCOL_VERSION.
Added empty implementation of decodeLast(). Empty, because read frame must be ACKed, but after beat disconnected it is impossible.
new version - BeatsFrameDecoder.java

@hc4
Copy link
Contributor Author

@hc4 hc4 commented Oct 24, 2016

Fixed deodeLast logic (method must read all data from buffer)
No PROTOCOL_VERSION errors so far (3+ days). Only "Connection timed out" (as expected).
BeatsFrameDecoder.java

hc4 added a commit to hc4/graylog-plugin-beats that referenced this issue Oct 26, 2016
Fixes Graylog2#14 and Graylog2#15
Removed readable bytes checks (it will be checked by ReplayingDecoder)
Added decodingLast implementation, that ignores all received data (we can't use it, because can't send ACK) and resets decoding state
Fixed error message for bad protocol version
@joschi joschi closed this in #17 Oct 26, 2016
joschi added a commit that referenced this issue Oct 26, 2016
Fixes #14
Fixes #15
joschi pushed a commit that referenced this issue Oct 26, 2016
Fixes #14
Fixes #15
(cherry picked from commit 8c751e0)
joschi pushed a commit that referenced this issue Oct 26, 2016
Fixes #14
Fixes #15
(cherry picked from commit 8c751e0)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants