Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Managed CloudWatch metrics based on Envoy stats #61

Open
rclark opened this issue Apr 24, 2019 · 13 comments
Open

Feature Request: Managed CloudWatch metrics based on Envoy stats #61

rclark opened this issue Apr 24, 2019 · 13 comments
Assignees

Comments

@rclark
Copy link

rclark commented Apr 24, 2019

Tell us about your request

I would like to see app-mesh provide some level of out-of-the-box integration with CloudWatch. This would be an extremely useful "value-add" to present to teams looking into adopting app-mesh for their application architecture.

Which integration(s) is this request for?

Ideally, this would cover any of the potential integrations, since its based on collection from Envoy stats, which are consistent across integrations.

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?

Currently envoy containers can be configured to output DogsStatsD compatible metrics, but documentation provides us with no instruction for using that to begin accumulating metrics in CloudWatch.

Outside of using app-mesh, I have in the past added cloudwatch agent sidecar containers to my ECS tasks in awsvpc networking mode and configured envoy to send metrics to it. However this is non-trivial, as the cw-agent is not well-designed for running in a docker container. Getting this to work involved reverse-engineering various shell scripts involved in configuring the agent.

Furthermore, learning the range of statistics emitted by Envoy and reducing them to metrics that you're interested in represents another undocumented (by AWS) learning curve.

I believe that the team should make some opinionated decisions about metrics that would be automatically aggregated to CloudWatch from envoy containers in your mesh. Further documentation about how to configure those metrics to meet your needs should also be included. If the implementation demands the customer to run the cloudwatch agent, then that application needs to be supported in each of app-mesh's integration scenarios (including ECS and EKS).

Are you currently working around this issue?

We are only in the prototyping stages of using app-mesh. Mostly I see this as a hindrance to adoption. If one of the primary value-adds of app-mesh is that it provides enhanced network-layer visibility, then the service ought to present that functionality by default.

Thanks!

@shubharao shubharao added the Docs label Sep 28, 2019
@shubharao shubharao added the Bug Something isn't working label Sep 28, 2019
@kiranmeduri
Copy link

@rclark are you using https://github.com/aws-samples/amazon-cloudwatch-container-insights/tree/master/cloudwatch-agent-dockerfile to build cloudwatch-agent docker image?

However, I still see the following problems using it in the context of App Mesh with Fargate/ECS. We want cwagent to be started before Envoy and hence would like to use DependsOn configuration on Envoy container. This means task's network namespace need to ignore traffic from cwagent (iptables rules), i.e. set UID:1337. cwagent container creates config files on startup and fails with permission denied.

2019/10/11 14:24:46 Failed to create the configuration validation file. Reason: open /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.toml: permission denied 

To get it to work I had to create a custom image using the following Dockerfile, similar to how Istio sets it up https://github.com/istio/istio/blob/master/docker

FROM debian:latest as build

RUN apt-get update &&  \
    apt-get install -y ca-certificates curl && \
    rm -rf /var/lib/apt/lists/*

RUN curl -O https://s3.amazonaws.com/amazoncloudwatch-agent/debian/amd64/latest/amazon-cloudwatch-agent.deb && \
    dpkg -i -E amazon-cloudwatch-agent.deb && \
    rm -rf /tmp/* && \
    rm -rf /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-config-wizard && \
    rm -rf /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl && \
    rm -rf /opt/aws/amazon-cloudwatch-agent/bin/config-downloader

# NOTICE: copied from https://github.com/istio/istio/blob/master/docker/Dockerfile.base
# Change ownership to allow agent to write generated files
RUN useradd -m --uid 1337 sidecar-agent && \
    echo "sidecar-agent ALL=NOPASSWD: ALL" >> /etc/sudoers && \
    chown -R sidecar-agent /opt/aws/amazon-cloudwatch-agent

FROM scratch

COPY --from=build /tmp /tmp
COPY --from=build /etc/passwd /etc/passwd
COPY --from=build /etc/sudoers /etc/sudoers
COPY --from=build /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/ca-certificates.crt
COPY --from=build /opt/aws/amazon-cloudwatch-agent /opt/aws/amazon-cloudwatch-agent
COPY cwagentconfig /etc/cwagentconfig

USER sidecar-agent

ENV RUN_IN_CONTAINER="True"
ENTRYPOINT ["/opt/aws/amazon-cloudwatch-agent/bin/start-amazon-cloudwatch-agent"]

This gets cwagent to work as a sidecar and forward metrics from Envoy configured to publish statsd.

         Environment:
             - Name: 'ENABLE_ENVOY_XRAY_TRACING'
              Value: '1'
            - Name: 'ENABLE_ENVOY_STATS_TAGS'
              Value: '1'
            - Name: 'ENABLE_ENVOY_DOG_STATSD'
              Value: '1'
            - Name: 'APPMESH_VIRTUAL_NODE_NAME'
              Value:

This still leaves me to build a dashboard that is not simple to do and requires Envoy understanding.

@kiranmeduri
Copy link

kiranmeduri commented Oct 11, 2019

Here are some action items to improve this

  • Provide a cwagent Docker image that works out-of-box with App Mesh.
  • Enable one-click create of CW dashboards from AWS Console for virtual-service.
  • Provide CFN template snippets to generate CW dashboard for virtual-service.
  • Provide CDK utilities to generate CW dashboard for virtual-service.

@bcelenza bcelenza removed the Bug Something isn't working label Nov 6, 2019
@bcelenza bcelenza changed the title [request]: Managed CloudWatch metrics based on Envoy stats Feature Request: Managed CloudWatch metrics based on Envoy stats Nov 6, 2019
@lavignes
Copy link

@kiranmeduri btw. The json/toml config for statsd collection can just be passed to the cwagent image as an environment variable. There is no need to a make custom image.

I started making an example that uses the cloudwatch agent and xray to get data to show up CloudWatch ServiceLens: https://github.com/lavignes/aws-app-mesh-examples/blob/service-lens/walkthroughs/howto-service-lens/app.yaml#L219

@kiranmeduri
Copy link

@lavignes i believe you still need to create image if you want to run cwagent as uid 1337. Otherwise traffic from cwagent have to go via Envoy and that may not be desirable if we want to monitor Envoy. Can you check if you can set uid to 1337 in your container def? Thanks

@nbrandaleone
Copy link

Hi. I am working on a blog post that shows how to view Envoy stats in CloudWatch. It is still in draft form, but you can check it out here: http://www.nickaws.net/aws/service_mesh/2019/12/29/AppMesh-Visibility.html

@rclark
Copy link
Author

rclark commented Jan 3, 2020

I appreciate that there are ways to collect envoy statistics as CloudWatch metrics, and @nbrandaleone your blog post looks super helpful towards that implementation.

But just to reiterate the key point of my original request: I shouldn't have to do this. App Mesh should be able to provide out-of-the-box metrics that provide me with a level of observability that I don't get by connecting a set of ECS services and load balancers.

I have in the past added cloudwatch agent sidecar containers to my ECS tasks in awsvpc networking mode and configured envoy to send metrics to it. However this is non-trivial...

Furthermore, learning the range of statistics emitted by Envoy and reducing them to metrics that you're interested in represents another undocumented (by AWS) learning curve.

I believe that the team should make some opinionated decisions about metrics that would be automatically aggregated to CloudWatch from envoy containers in your mesh.

@lavignes
Copy link

lavignes commented Jan 3, 2020

@rclark thanks for the input. There is absolutely a learning curve here that is non-trivial. I think some of the action items that @kiranmeduri listed above would move us a lot closer to what many people need. One-click options for setting up a cwagent and generating opinionated dashboards are absolutely something that App Mesh should provide.

@mchene
Copy link

mchene commented Mar 13, 2020

Hey everyone, I’m a Product Manager for CloudWatch. We are looking for people to join our beta program to provide feedback and test App Mesh monitoring and troubleshooting as part of CloudWatch Container Insights. The beta program will allow you to test the collection and visualization of Prometheus metrics from Envoy. We are starting with Kubernetes. Email me if interested.

@shubharao shubharao added this to We're Working On It in aws-app-mesh-roadmap Mar 13, 2020
@bcelenza
Copy link
Contributor

@mchene Could you provide an email address?

@mchene
Copy link

mchene commented Mar 18, 2020

machene@amazon.com! No spamming! :)

@jamsajones jamsajones moved this from We're Working On It to Accepted in aws-app-mesh-roadmap Oct 21, 2020
@bcelenza bcelenza removed their assignment Dec 3, 2020
@mkielar
Copy link

mkielar commented Sep 23, 2021

Oh boy, so much time passed, and this is still not adressed in any way?
I just tried using the latest-greatest public.ecr.aws/cloudwatch-agent/cloudwatch-agent:1.247349.0b251399 today, setting the userid in my ECS Container to 1337, and it still failed with:

2021/09/23 12:52:17 Failed to create the configuration validation file. Reason: open /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.toml: permission denied

Is there seriously no official Cloudwatch Agent image that would handle Appmesh out-of-the-box?

Also, duplicate: #122

@rajal-amzn rajal-amzn assigned LancerRainier and unassigned akestner Oct 1, 2021
@rajal-amzn rajal-amzn moved this from Accepted to Researching in aws-app-mesh-roadmap Oct 1, 2021
@kevinten10
Copy link

Hi, is there any progress?

@mkielar
Copy link

mkielar commented May 13, 2022

For anyone still waiting for official cloudwatch image that works with AppMesh/Envoy: Consider migrating to AWS OpenTellemetry Collector Sidecar. Here's an example configuration: https://github.com/aws/aws-app-mesh-examples/blob/main/walkthroughs/howto-metrics-extension-ecs/README.md#optional-filtering-metrics-with-the-aws-distro-for-opentelemetry

The additional advantage is you can filter StatsD metrics similarly to how Cloudwatch filters Prometheus, while still being able to process histogram metrics (like latency) that Cloudwatch still cannot handle when scrapping Prometheus.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
aws-app-mesh-roadmap
  
Researching
Development

No branches or pull requests