[Fargate] [regression]: potential network instability in 1.4.0

Apologies in advance for a relatively vague issue description but unfortunately even with debug logging it is difficult to shed a light on this.

Basically we have been running logstash (latest stable) on fargate 1.3.0 with aws-msk (kafka) as input (=consumer) for a couple of months with not a single issue - no crashes, including autoscaling triggering based on CPU usage.

As soon as we switched to 1.4.0 though, we started noticing logstash suddenly and unfortunately silently crashing. E.g. logstash as a container/process is still running, health checks are fine, but its was not consuming any logs anymore.

Upon further inspection we can unfortunately not see anything in the logs, however we can tell that the connection to kafka is not anymore active.

Switching back to 1.3.0 without any further changes brought back the stability (running for over a week without a single "crash").

In my limited visibility I am concluding, and in conjunction with: https://github.com/logstash-plugins/logstash-integration-kafka/issues/15, that the connection to kafka is maybe dropping unexpectedly which logstash at that point is unable to catch.

Is there for some reason a possibility that between 1.3.0 and 1.4.0 a connection timeout existing, or are there any other changes that might interfere with kafka in any way?

We have been running other applications that mostly have short-lived connections (including a kafka producer) on 1.4.0 for weeks without any issues.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Fargate] [regression]: potential network instability in 1.4.0 #992

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Fargate] [regression]: potential network instability in 1.4.0 #992

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions