Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add prometheus metrics #65

Merged
merged 1 commit into from
May 15, 2018
Merged

Conversation

liwenwu-amazon
Copy link
Contributor

@liwenwu-amazon liwenwu-amazon commented May 11, 2018

Descriptions:

Add following prometheus metrics which provide metrics at localhost:61678/metrics

  • aws_api_lantency_ms
  • aws_api_error_count
  • eni_allocated
  • total_ip_addresses
  • assigned_ip_addresses

Also add a tool cni-metrics-helper which can collect CNI prometheus metrics from all nodes in the cluster by

# deploying
kubectl apply -f cni_metrics_helper.yaml

#exam metrics by
kubectl logs cni-metrics-helper-xxx -n kube-system

Tests Performed
Deployed 4000 busybox pods over 80 nodes. Verify the prometheus metrics from all 80 nodes, that aggregated eni_allocated is 320, aggregated total_ip_addresses is 4.48k, assigned_ip_addresses is 4k

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@samuelkarp
Copy link

Please choose a different port than 51678.

assignedIPs = prometheus.NewGauge(
prometheus.GaugeOpts{
Name: "assigned_ip_addresses",
Help: "The number of ip addresses assigned",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please capitalize initialisms IP and ENI in Help text. also eni should be plural ENIs

awsAPILatency = prometheus.NewSummaryVec(
prometheus.SummaryOpts{
Name: "aws_api_lantency_ms",
Help: "aws API call latency in ms",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

capitalize AWS

awsAPIErr = prometheus.NewCounterVec(
prometheus.CounterOpts{
Name: "aws_api_error_count",
Help: "the number of times aws API returns an err",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

capitalize The and AWS

if err != nil {
awsAPIErrInc("TagResources", err)
log.Warnf("Fail to tag the newly created eni %s %v", eniID, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're counting this as an Error, why do we log as a Warning? Might be a good idea to make a function logAndTallyErrorf since we appear to always want them both together.

Help: "The number of ip addresses assigned",
},
)
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears we're tracking and emiting metrics on eni count, total address, and assigned address. One can derived Available address as Total - Assigned, but it might be nice to emit that directly. Also would be nice to emit Max address (ENIs * IPsPerENI). Addresses go through a cool down period, might be nice to offer visibility into that too.

@liwenwu-amazon liwenwu-amazon force-pushed the metrics branch 2 times, most recently from 155dddd to 8eb0758 Compare May 15, 2018 04:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants