Skip to content
This repository has been archived by the owner on Dec 18, 2019. It is now read-only.

Commit

Permalink
feat(mss monitoring alert): addressing pr comments (INTLY-1893)
Browse files Browse the repository at this point in the history
feat(monitoring alert): changed to use service metrics (INTLY-1893)

feat(montoring alert): modify rules (INTLY-1893)

feat(monitoring alert): wording change(INTLY-1893)

feat(monitoring alert): modify pod count alert(INTLY-1893)

feat(monitoring alert): modify http request failure alert(INTLY-1893)
  • Loading branch information
austincunningham committed Jun 27, 2019
1 parent 8699c4e commit f331f7e
Show file tree
Hide file tree
Showing 2 changed files with 6 additions and 7 deletions.
9 changes: 4 additions & 5 deletions deploy/monitor/mss_prometheus_rule.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,6 @@ metadata:
prometheus: application-monitoring
role: alert-rules
name: mobile-security-service
namespace: [[ .Namespace ]]
spec:
selector:
matchLabels:
Expand Down Expand Up @@ -61,7 +60,7 @@ spec:
summary: "The mobile-security-service is reporting high memory usage for more that 5 minutes. For more information see on the MMS at https://github.com/aerogear/mobile-security-service"
sop_url: "https://github.com/aerogear/mobile-security-service-operator/SOP/SOP-mss.md"
- alert: MobileSecurityServiceApiHighRequestDuration
expr: "go_gc_duration_seconds{job='mobile-security-service-application'} > 30"
expr: "api_requests_duration_seconds{job='mobile-security-service-application', quantile='0.5'} > 30"
for: 5m
labels:
severity: warning
Expand All @@ -70,7 +69,7 @@ spec:
summary: "The mobile-security-service is reporting high request latency for more that 5 minutes. For more information see on the MMS at https://github.com/aerogear/mobile-security-service"
sop_url: "https://github.com/aerogear/mobile-security-service-operator/SOP/SOP-mss.md"
- alert: MobileSecurityServiceApiHighConcurrentRequests
expr: "promhttp_metric_handler_requests_in_flight{job='mobile-security-service-application'} > 50"
expr: "api_requests_in_flight{job='mobile-security-service-application'} > 50"
for: 5m
labels:
severity: warning
Expand All @@ -79,11 +78,11 @@ spec:
summary: "The mobile-security-service is reporting high request concurrency for more that 5 minutes. For more information see on the MMS at https://github.com/aerogear/mobile-security-service"
sop_url: "https://github.com/aerogear/mobile-security-service-operator/SOP/SOP-mss.md"
- alert: MobileSecurityServiceApiHighRequestFailure
expr: "sum(promhttp_metric_handler_requests_total{job='mobile-security-service-application', code=~'503|500'})>10"
expr: "rate(api_requests_failure_total{job='mobile-security-service-application'}[1h])>10"
for: 1h
labels:
severity: warning
annotations:
description: "The mobile-security-service api has reported more that 10 request failures(500/503) in an hour"
description: "The mobile-security-service api has reported more that 10 request failures in an hour"
summary: "The mobile-security-service is reporting a high request failure over an hour. For more information see on the MMS at https://github.com/aerogear/mobile-security-service"
sop_url: "https://github.com/aerogear/mobile-security-service-operator/SOP/SOP-mss.md"
4 changes: 2 additions & 2 deletions deploy/monitor/prometheus_rule.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -27,10 +27,10 @@ spec:
- alert: MobileSecurityServicePodcount
annotations:
description: The Pod count for the mobile-security-service has changed in the last 5 minutes.
summary: Pod count for namespace mobile-security-service is {{ printf "%.0f" $value }}. Expected exactly 3 pods. For more information see on the MMS operator https://github.com/aerogear/mobile-security-service-operator"
summary: Pod count for namespace mobile-security-service is {{ printf "%.0f" $value }}. Expected 3 pods. For more information see on the MMS operator https://github.com/aerogear/mobile-security-service-operator"
sop_url: "https://github.com/aerogear/mobile-security-service-operator/SOP/SOP-operator.md"
expr: |
(1-absent(kube_pod_status_ready{condition="true", namespace="mobile-security-service"})) or sum(kube_pod_status_ready{condition="true", namespace="mobile-security-service"}) != 3
for: 5m
labels:
severity: critical
severity: warning

0 comments on commit f331f7e

Please sign in to comment.