`top` command to view crossplane resource usage and metrics #5036

jbw976 · 2023-11-18T01:43:18Z

There is currently not a streamlined (easy) way to get quick observability and insight into Crossplane's resource usage and metrics. We could build a crossplane top command that quickly exposes some key metrics to users of the crossplane CLI so they get a high level view of overall health of the control plane.

Quick thoughts:

There may be overlap with kubectl top and perhaps it can be leveraged
- This issue goes further though by also exposing Crossplane specific insight beyond just simple memory/CPU
This is related to the observability epic (High level metrics #4620) and would be enhanced by those higher level insightful metrics, but is not blocked on those.
- The key to this issue is to expose what we have and what we can, i.e. make it easier for users to quickly access this information

The text was updated successfully, but these errors were encountered:

Piotr1215 · 2023-12-08T15:07:09Z

Here is a quick design doc, would love to get everyone's feedback or implementation ideas.

Introduction & Goals

The goal is to create a new command in the crossplane CLI which would:

Show crossplane, providers and functions pods resources utilization same as kubectl top pods
Show crossplane, providers and functions pods custom metrics

Constraints

There is a fine line between having a fully fledged Prometheus metrics visualized with Grafana and having a simple terminal output.

We do not want to implement Prometheus v2, but rather expose simple metrics and resources utilization which gives a quick glimpse into the state of the crossplane machinery.

Context

The crossplane top command will be part of the crossplane CLI and will inherit its context and communication patterns.

The below diagram shows the components that the command interacts with.

sequenceDiagram
    autonumber
    actor User
    participant Terminal as crossplane top
    participant MetricsEndpoint as crossplane/provider/function pod /metrics endpoint

    User->>Terminal: Keyboard input
    Terminal->>MetricsEndpoint: HTTP call to /metrics endpoint on pod
    Note right of MetricsEndpoint: HTTP call to kubelet to retrieve resoureces utilization
    MetricsEndpoint-->>Terminal: Metrics/Resource utilization with latest values
    Terminal-->>User: Display output
    Note left of Terminal: also possible to use <br/>watch for auto-updates

Solution Strategy

The primary objective is to introduce a streamlined command that provides users with instantaneous insights into resource utilization and key metrics without resorting to more heavyweight observability tooling.

Here are a few key design decisions grouped into sections.

Metrics Acquisition

Unlike traditional approaches that rely on Prometheus, this solution will extract metrics directly from crossplane/providers/functions pods. This ensures that simple metrics and resource utilization information will be accessible without the need to install Prometheus or other observability tools.

User Experience Consistency
By mirroring the output format of kubectl top pods, we adhere to the Principle of Least Astonishment, which facilitates ease of use for users accustomed to kubectl.

Terminal Interface
Integration with pterm will manage terminal-specific contexts, such as color settings, to enhance the interface experience without reinventing the wheel for edge cases scenarios.

Tailored Metrics Selection
Metrics will be pre-selected for each object type; crossplane, provider, function, to ensure relevance and clarity. This may result in varied output columns corresponding to the distinct metrics of each pod type.

Parsing and Presentation:
The utilization of the prometheus_client library will allow for the parsing of metrics and their transformation into a format that is accessible and comprehensible to the user.

Data Snapshot:
Metrics will initially present a static snapshot, reflecting metrics or resources utilization at a single point in time. Users will be able to use watch command to see new values each time the watch timer elapses.

Mock Designs

Sample command output with crossplane and provider pods running:

It is also possible to add header information which would act like a combination of kubectl cluster info and kubectl top pods

Out of Scope

A few ideas that are worth noting but are out of scope for the initial implementation:

Provide the output of Grafana dashboard JSON for users to import, which would capture the same metrics as the command output.
Add more graphical elements and interactivity similar to for example kdash project.

Future improvements

For now the plan is to cover crossplane, providers and functions pods. In the future, we could add other crossplane native API types such as External Secret Store Plugins and maybe others.

Crossplane Docs · v1.14 · Crossplane API

POC Implementation

The Current POC implementation can be found in the crossplane-top repository.

package main

import (
	"context"
	"fmt"
	"os"
	"path/filepath"
	"strings"

	"k8s.io/apimachinery/pkg/api/resource"
	metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
	"k8s.io/client-go/kubernetes"
	"k8s.io/client-go/tools/clientcmd"
	"k8s.io/metrics/pkg/client/clientset/versioned"
)

func getKubeConfig() (string, error) {
	kubeconfig := os.Getenv("KUBECONFIG")
	if kubeconfig != "" {
		return kubeconfig, nil
	}
	home, err := os.UserHomeDir()
	if err != nil {
		return "", fmt.Errorf("could not get user home directory: %w", err)
	}
	return filepath.Join(home, ".kube", "config"), nil
}

func main() {
	if err := run(); err != nil {
		fmt.Fprintf(os.Stderr, "error: %v\n", err)
		os.Exit(1)
	}
}

func run() error {

	// Build the config from the kubeconfig path
	kubeconfig, err := getKubeConfig()
	if err != nil {
		return fmt.Errorf("failed to get kubeconfig: %w", err)
	}
	config, err := clientcmd.BuildConfigFromFlags("", kubeconfig)
	if err != nil {
		return fmt.Errorf("could not build config from flags: %w", err)
	}

	// Create the clientset for Kubernetes
	k8sClientset, err := kubernetes.NewForConfig(config)
	if err != nil {
		return fmt.Errorf("could not create the clientset for Kubernetes: %w", err)
	}

	// Create the clientset for Metrics
	metricsClientset, err := versioned.NewForConfig(config)
	if err != nil {
		return fmt.Errorf("could not create the clientset for Metrics: %w", err)
	}

	// Fetch all pods from all namespaces in case of Crossplane pods being installed elswhere
	pods, err := k8sClientset.CoreV1().Pods(metav1.NamespaceAll).List(context.TODO(), metav1.ListOptions{})
	if err != nil {
		return fmt.Errorf("could not fetch all pods from all namespaces: %w", err)
	}

	fmt.Printf("%-20s %-55s %-12s %-15s\n", "NAMESPACE", "NAME", "CPU(cores)", "MEMORY(Mi)")

	// Loop through pods and print metrics
	for _, pod := range pods.Items {
		for labelKey := range pod.GetLabels() {
			if strings.HasPrefix(labelKey, "pkg.crossplane.io/provider") || strings.HasPrefix(labelKey, "pkg.crossplane.io/function") {
				podMetrics, err := metricsClientset.MetricsV1beta1().PodMetricses(pod.Namespace).Get(context.TODO(), pod.Name, metav1.GetOptions{})
				if err != nil {
					fmt.Printf("Error getting metrics for pod %s: %v\n", pod.Name, err)
					continue
				}

				for _, container := range podMetrics.Containers {
					cpuUsage := container.Usage.Cpu().ScaledValue(resource.Milli)
					memoryUsage := fmt.Sprintf("%dMi", container.Usage.Memory().ScaledValue(resource.Mega))
					fmt.Printf("%-20s %-55s %-12d %-15s\n", pod.Namespace, pod.Name, cpuUsage, memoryUsage)
				}
			}
		}
	}
	return nil
}

Risks and Technical Debt

Crossplane CLI is part of the crossplane repository and in the future will be moved to a separate repository. This might accumulate some technical debt in the command that will need to address in the future.

A risk is overloading the command with additional capabilities and making it a Prometheus in the terminal, which should be avoided.
An important design consideration was if when Prometheus is present on the cluster whether the once scraped metrics will be reset. After preliminary research, this appears not to be the case.
When additional Crossplane metrics are going to be implemented, the command will need to be changed to accommodate the new metrics.

jbw976 · 2023-12-27T14:58:51Z

Great stuff @Piotr1215, I think that is a very reasonable proposal and a great place to get started in v1.15 ✅

A few thoughts to share:

An important part of the experience will be to make it easy for user’s to get metrics if they haven't already been enabled. a lot of people stumble on that today and (as you have mentioned) end up googling and finding Dan's blog post from the past 😬
- Can we consider a subcommand to explicitly enable metrics? This would make it very easy for folks to get metrics going with one command and not have to search around.
  - e.g. crossplane top enable or something like that
- If metrics are not already enabled when the crossplane top command is run, the output/error message could make it very clear that they can simply run crossplane top enable to get metrics going.
Do you have any further thoughts already on what are the specific metrics we want to enable beyond CPU/memory? What is useful from the metrics (mostly controller-runtime based) that we're already collecting?
- curl -s https://raw.githubusercontent.com/negz/crossplane-scale/main/grafana-dashboard.json | grep -F -i '"title":' could generate some good ideas 😉
the diagram shows the pod making the HTTP call to /metrics, is that right? I assumed the CLI would make the HTTP call to /metrics instead.

Piotr1215 · 2024-01-02T16:21:51Z

Thank you for the comments @jbw976 🙏

adding crossplane top enable might be a good idea, however, when discussing the implementation we have stumbled upon a few questions. Would it be ok if we add it as a fast follow functionality?
selecting metrics would be my next step, many thanks for the idea with grepping the dashboard source, this is a great start! Looking at the titles, I think the following 3 would be most interesting:

      "title": "API Server Requests",
      "title": "Workqueue Duration",
      "title": "API Server Request Latency",

nice catch on the diagram, it is now simplified and should reflect the implementation better.

jbw976 · 2024-01-10T15:40:09Z

cool @Piotr1215!!

yes, incrementally adding the functionality to enable/disable metrics sounds great for a follow-on PR
I was also thinking something with the number of warning events could be interesting, and also reconciles by controller? 🧐

Piotr1215 · 2024-01-16T16:58:43Z

Thank you for suggestions @jbw976. A number of warning events could be a nice early trigger warning that something is wrong with the cluster. Also, nr of reconciles would definitely affect performance. Definitely worth looking into.

For now in the first small PR I'm targeting basic functionality mirroring kubectl top pods but only for crossplane pods. The follow-up PRs will add more functionality :).

jbw976 · 2024-01-24T20:42:00Z

Initial implementation of this feature has been merged for v1.15: #5245

jbw976 mentioned this issue Nov 18, 2023

Developer Experience improvements for v1.15 #4654

Closed

jbw976 added user experience observability crossplane-cli labels Nov 18, 2023

jbw976 added this to the v1.15 milestone Nov 18, 2023

jeanduplessis assigned Piotr1215 Nov 18, 2023

Piotr1215 mentioned this issue Nov 19, 2023

Scale test composition functions #5004

Open

Piotr1215 mentioned this issue Jan 15, 2024

Crossplane top command - pods resources utilization implementation #5245

Merged

6 tasks

jbw976 closed this as completed Jan 24, 2024

Piotr1215 mentioned this issue Feb 12, 2024

Promote top crossplane CLI subcommand to GA #5372

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`top` command to view crossplane resource usage and metrics #5036

`top` command to view crossplane resource usage and metrics #5036

jbw976 commented Nov 18, 2023 •

edited

Piotr1215 commented Dec 8, 2023 •

edited

jbw976 commented Dec 27, 2023 •

edited

Piotr1215 commented Jan 2, 2024

jbw976 commented Jan 10, 2024

Piotr1215 commented Jan 16, 2024

jbw976 commented Jan 24, 2024

top command to view crossplane resource usage and metrics #5036

top command to view crossplane resource usage and metrics #5036

Comments

jbw976 commented Nov 18, 2023 • edited

Piotr1215 commented Dec 8, 2023 • edited

Introduction & Goals

Constraints

Context

Solution Strategy

Mock Designs

Out of Scope

Future improvements

POC Implementation

Risks and Technical Debt

jbw976 commented Dec 27, 2023 • edited

Piotr1215 commented Jan 2, 2024

jbw976 commented Jan 10, 2024

Piotr1215 commented Jan 16, 2024

jbw976 commented Jan 24, 2024

`top` command to view crossplane resource usage and metrics #5036

`top` command to view crossplane resource usage and metrics #5036

jbw976 commented Nov 18, 2023 •

edited

Piotr1215 commented Dec 8, 2023 •

edited

jbw976 commented Dec 27, 2023 •

edited