-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Request: Managed CloudWatch metrics based on Envoy stats #61
Comments
@rclark are you using https://github.com/aws-samples/amazon-cloudwatch-container-insights/tree/master/cloudwatch-agent-dockerfile to build cloudwatch-agent docker image? However, I still see the following problems using it in the context of App Mesh with Fargate/ECS. We want cwagent to be started before Envoy and hence would like to use DependsOn configuration on Envoy container. This means task's network namespace need to ignore traffic from cwagent (iptables rules), i.e. set UID:1337. cwagent container creates config files on startup and fails with permission denied.
To get it to work I had to create a custom image using the following Dockerfile, similar to how Istio sets it up https://github.com/istio/istio/blob/master/docker
This gets cwagent to work as a sidecar and forward metrics from Envoy configured to publish statsd.
This still leaves me to build a dashboard that is not simple to do and requires Envoy understanding. |
Here are some action items to improve this
|
@kiranmeduri btw. The json/toml config for statsd collection can just be passed to the cwagent image as an environment variable. There is no need to a make custom image. I started making an example that uses the cloudwatch agent and xray to get data to show up CloudWatch ServiceLens: https://github.com/lavignes/aws-app-mesh-examples/blob/service-lens/walkthroughs/howto-service-lens/app.yaml#L219 |
@lavignes i believe you still need to create image if you want to run cwagent as uid 1337. Otherwise traffic from cwagent have to go via Envoy and that may not be desirable if we want to monitor Envoy. Can you check if you can set uid to 1337 in your container def? Thanks |
Hi. I am working on a blog post that shows how to view Envoy stats in CloudWatch. It is still in draft form, but you can check it out here: http://www.nickaws.net/aws/service_mesh/2019/12/29/AppMesh-Visibility.html |
I appreciate that there are ways to collect envoy statistics as CloudWatch metrics, and @nbrandaleone your blog post looks super helpful towards that implementation. But just to reiterate the key point of my original request: I shouldn't have to do this. App Mesh should be able to provide out-of-the-box metrics that provide me with a level of observability that I don't get by connecting a set of ECS services and load balancers.
|
@rclark thanks for the input. There is absolutely a learning curve here that is non-trivial. I think some of the action items that @kiranmeduri listed above would move us a lot closer to what many people need. One-click options for setting up a cwagent and generating opinionated dashboards are absolutely something that App Mesh should provide. |
Hey everyone, I’m a Product Manager for CloudWatch. We are looking for people to join our beta program to provide feedback and test App Mesh monitoring and troubleshooting as part of CloudWatch Container Insights. The beta program will allow you to test the collection and visualization of Prometheus metrics from Envoy. We are starting with Kubernetes. Email me if interested. |
@mchene Could you provide an email address? |
machene@amazon.com! No spamming! :) |
Oh boy, so much time passed, and this is still not adressed in any way?
Is there seriously no official Cloudwatch Agent image that would handle Appmesh out-of-the-box? Also, duplicate: #122 |
Hi, is there any progress? |
For anyone still waiting for official cloudwatch image that works with AppMesh/Envoy: Consider migrating to AWS OpenTellemetry Collector Sidecar. Here's an example configuration: https://github.com/aws/aws-app-mesh-examples/blob/main/walkthroughs/howto-metrics-extension-ecs/README.md#optional-filtering-metrics-with-the-aws-distro-for-opentelemetry The additional advantage is you can filter StatsD metrics similarly to how Cloudwatch filters Prometheus, while still being able to process histogram metrics (like latency) that Cloudwatch still cannot handle when scrapping Prometheus. |
Tell us about your request
I would like to see app-mesh provide some level of out-of-the-box integration with CloudWatch. This would be an extremely useful "value-add" to present to teams looking into adopting app-mesh for their application architecture.
Which integration(s) is this request for?
Ideally, this would cover any of the potential integrations, since its based on collection from Envoy stats, which are consistent across integrations.
Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?
Currently envoy containers can be configured to output DogsStatsD compatible metrics, but documentation provides us with no instruction for using that to begin accumulating metrics in CloudWatch.
Outside of using app-mesh, I have in the past added cloudwatch agent sidecar containers to my ECS tasks in
awsvpc
networking mode and configured envoy to send metrics to it. However this is non-trivial, as the cw-agent is not well-designed for running in a docker container. Getting this to work involved reverse-engineering various shell scripts involved in configuring the agent.Furthermore, learning the range of statistics emitted by Envoy and reducing them to metrics that you're interested in represents another undocumented (by AWS) learning curve.
I believe that the team should make some opinionated decisions about metrics that would be automatically aggregated to CloudWatch from envoy containers in your mesh. Further documentation about how to configure those metrics to meet your needs should also be included. If the implementation demands the customer to run the cloudwatch agent, then that application needs to be supported in each of app-mesh's integration scenarios (including ECS and EKS).
Are you currently working around this issue?
We are only in the prototyping stages of using app-mesh. Mostly I see this as a hindrance to adoption. If one of the primary value-adds of app-mesh is that it provides enhanced network-layer visibility, then the service ought to present that functionality by default.
Thanks!
The text was updated successfully, but these errors were encountered: