Using Grafana Explorer with PromQL, Search UI and Visual Web Terminal, ACM allows us to explore and interrogate the data collected from across the fleet of managed clusters. This allows you to inspect the state of he fleet, test your hypothesis as well as problem diagnosis. Follow this document to do a dip your feet into this fascinating and empowering world.
Hit the search icon
on the top right hand corner in ACM UI
Hint: Click on the shortcut in the UI called "Create Last Hour"
Can you figure out how to find out the namespaces that were created in the last hour - hint add a filter: kind:namespace
Hint: Type in kind:pod prometheus-
Hint: Type in kind:pod prometheus-
Hint: Type in kind:cluster
Hint: Type in kind:policy ccompliant:NonCompliant
Hint: Type in kind:application
Hint: Click on the shortcut in the UI called "Unhealthy Pods" and then type in created:hour
Hint: You should be able to follow directions in the UI to save it- and then come back to it at a later time.
Hint: Type in kind:multiclusterobservability
multiclusterobservability is an instance of the MultiClusterObservability that you may have created when you enabled Observability in RHACM. You can extend this to check what was created in the last hour around this CR as well or extend to other custom resources as well.
Select Pods by any one of the techniques shown above, drill down to one Pod and see the Logs
Select Pods by any one of the techniques shown above, select Pod and use the expand on the right hand side to delete the pod
Figure this one out yourself. It is easy
Open the ACM UI and navigate to Observe Environment -> Overview -> Grafana -> Explore (icon on the left hand panel)
. Remember, building dashboards for each and everything can be noisy. You can bookmark your favorite queries using Grafana Explore .
Type into the Explorer Bar: count(count by (cluster) (cluster_version))
Check the number of container restarts for a certain container in a certain namespace in a certain cluster
Type into the Explorer Bar: kube_pod_container_status_restarts_total{cluster="local-cluster",namespace="open-cluster-management",pod=~".*redis.*"}
Type into the Explorer Bar:
sum by (cluster,namespace, container, pod) (kube_pod_container_status_restarts_total) >0
Type into the Explorer Bar:
sum(cluster:usage:resources:sum{cluster="local-cluster"}) by (resource)
Note: cluster:usage:resources:sum is a out of the boxrecording rule created in OpenShift and it only gathers data for certain types of resources
Type into the Explorer Bar: ALERTS
Type into the Explorer Bar: rate(haproxy_backend_connection_errors_total[5m])
Type into the Explorer Bar:
sum(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{namespace="$namespace", pod="$pod", container!="POD", cluster="$cluster"}) by (container)
or
sum(container_memory_working_set_bytes{cluster="$cluster", namespace="$namespace", pod="$pod", container!="POD", container!=""}) by (container)
Substitute the $pod etc . For example: sum(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{namespace="open-cluster-management", container!="POD", cluster="local-cluster",pod=~".*redis.*"}) by (container)
Type into the Explorer Bar:
sum(rate(etcd_object_counts{resource!="events"}[5m]) ) by (cluster,resource) >0
Check if there is a correlation between object count in API Server and CPU, Memory of a POD or alerts
Can you figure this out yourself now. Hint: This may help [Grafana Explore](https://grafana.com/docs/grafana/latest/explore/) .
Hit the Visual Web Terminal icon
on the top right hand corner in ACM UI
In the Visual Web Terminal
Enter: kubectl get ns
Select on UI: open-cluster-management
Enter: kubectl get pods
Select on the UI: *redisgraph* pod
Explore the different tabs
Did you also see that not only can you see logs and events, you can log into the terminal of that pod and run a command!