## Wallaroo Admin Dashboard Metrics Retrieval Tutorial

The following tutorial demonstrates using the Wallaroo MLOps API to retrieve Wallaroo metrics data.  These requests are compliant with Prometheus API endpoints.  

This tutorial lists the metrics queries available and demonstrates how to perform each of the queries.

### Prerequisites

This tutorial assumes the following:

* A Wallaroo Ops environment is installed.
* The Wallaroo SDK is installed.  These examples use the Wallaroo SDK to generate the initial inferences information for the metrics requests.

## Inference Data Generation

This part of the tutorial generates the inference results used for the rest of the tutorial.

### Import libraries

The first step is to import the libraries required.

In [33]:
import json
import numpy as np
import pandas as pd

import pytz
import datetime

import requests
from requests.auth import HTTPBasicAuth

import wallaroo

### Connect to the Wallaroo Instance

A connection to Wallaroo is established via the Wallaroo client.  The Python library is included in the Wallaroo install and available through the Jupyter Hub interface provided with your Wallaroo environment.

This is accomplished using the `wallaroo.Client()` command, which provides a URL to grant the SDK permission to your specific Wallaroo environment.  When displayed, enter the URL into a browser and confirm permissions.  Store the connection into a variable that can be referenced later.

If logging into the Wallaroo instance through the internal JupyterHub service, use `wl = wallaroo.Client()`.  For more information on Wallaroo Client settings, see the [Client Connection guide](https://docs.wallaroo.ai/wallaroo-developer-guides/wallaroo-sdk-guides/wallaroo-sdk-essentials-guide/wallaroo-sdk-essentials-client/).

In [35]:
wl = wallaroo.Client(api_endpoint="https://autoscale-uat-gcp.wallaroo.dev/", 
                     auth_type="sso")



In [36]:
model_name = "ccfraud-model"
model_file_name = "./models/ccfraud.onnx"


The following queries are available for resource consumption.  Note where each request the `/v1/metrics/api/v1/query` endpoint.

| Query Name | API Route | Example Query | Description | 
|---|---|---|---|
| Total CPU Requested | query | `sum(wallaroo_kube_pod_resource_requests{resource="cpu"})` | Number of CPUs requested in the Wallaroo cluster |
| Total CPU allocated | query | `sum(kube_node_status_capacity{resource="cpu"})` | Total number of available CPUs in the Wallaroo cluster |
| Total GPU Requested | query | `sum(wallaroo_kube_pod_resource_requests{resource=~"nvidia.com/gpu\|qualcomm.com/qaic"})` | Number of GPUs requested in the Wallaroo cluster |
| Total GPU Allocated | query | `sum(kube_node_status_capacity{resource=~"nvidia_com_gpu\|qualcomm_com_qaic"})` | Total number of available GPUs in the Wallaroo cluster |
| Total Memory Requested | query | `sum(wallaroo_kube_pod_resource_requests{resource="memory"})` | Amount of memory requested in the Wallaroo cluster. |
| Total Memory Allocated | query | `sum(kube_node_status_capacity{resource="memory"})` | Total amount of memory available in the Wallaroo cluster. |
| Total Inference Log Storage used | query | `kubelet_volume_stats_used_bytes{persistentvolumeclaim="plateau-managed-disk"}` | Amount of inference log storage used. |
| Total Inference Log Storage allocated | query | `kubelet_volume_stats_capacity_bytes{persistentvolumeclaim="plateau-managed-disk"}` | Total amount of inference log storage available. |
| Total Artifact Storage used | query | `kubelet_volume_stats_capacity_bytes{persistentvolumeclaim="minio"}` | Amount of model and orchestration artifact storage used. |
| Total Artifact Storage allocated | query | `kubelet_volume_stats_used_bytes{persistentvolumeclaim="minio"}` | Total amount of model and orchestration artifact storage available. |
| Average GPU usage over time | query | `avg_over_time(sum(wallaroo_kube_pod_resource_requests{resource=~"nvidia.com/gpu\|qualcomm.com/qaic"})[1h:] offset 1h)` | Average GPU usage over the defined time range in the Wallaroo cluster. |
| Average GPU requested over time | query | `avg_over_time(sum(wallaroo_kube_pod_resource_requests{resource=~"nvidia.com/gpu\|qualcomm.com/qaic"})[1h:] offset 1h)` | Average number of GPU requested over the defined time range in the Wallaroo cluster | 
| Average CPU usage over time | query | `avg_over_time(sum(wallaroo_kube_pod_resource_usage{resource="cpu"})[1h:] offset 1h)` | Average CPU usage over the defined time range in the Wallaroo cluster. |
|  Average CPU requested over time | query | `avg_over_time(sum(wallaroo_kube_pod_resource_requests{resource="cpu"})[1h:] offset 1h)` | Average CPU requests over the defined time range in the Wallaroo cluster |
| Average Memory usage over time | query | `avg_over_time(sum(wallaroo_kube_pod_resource_usage{resource="memory"})[1h:] offset 1h)` | Average memory usage over the defined time range in the Wallaroo cluster. |
| Average Memory requests over time | query | `avg_over_time(sum(wallaroo_kube_pod_resource_requests{resource="memory"})[1h:] offset 1h)` | Average memory requests over the defined time range in the Wallaroo cluster. |
| Average pipelines CPU usage over time | query | `avg_over_time(sum by(namespace)(wallaroo_kube_pod_resource_usage{resource="cpu"})[1h:] offset 1h)` | Average CPU usage over the defined time range for an individual Wallaroo pipeline. |
| Average pipelines CPU requested over time | query | `avg_over_time(sum by(namespace)(wallaroo_kube_pod_resource_requests{resource="cpu"})[1h:] offset 1h)` | Average number of CPUs requested over the defined time range for an individual Wallaroo pipeline. |
| Average pipelines GPU usage over time | query | `avg_over_time(sum by(namespace)(wallaroo_kube_pod_resource_requests{resource=~"nvidia.com/gpu\|qualcomm.com/qaic"})[1h:] offset 1h)` | Average GPU usage over the defined time range for an individual Wallaroo pipeline. |
| Average pipelines GPU requested over time | query | `avg_over_time(sum by(namespace)(wallaroo_kube_pod_resource_requests{resource=~"nvidia.com/gpu\|qualcomm.com/qaic"})[1h:] offset 1h)` | Average number of GPUs requested over the defined time range for an individual Wallaroo pipeline. |
| Average pipelines Mem usage over time | query | `avg_over_time(sum by(namespace) (wallaroo_kube_pod_resource_usage{resource="memory"})[1h:] offset 1h)` | Average memory usage over the defined time range for an individual Wallaroo pipeline. |
| Average pipelines Mem requested over time | query | `avg_over_time(sum by (namespace)(wallaroo_kube_pod_resource_requests{resource="memory"})[1h:] offset 1h)` | Average amount of memory requested over the defined time range for an individual Wallaroo pipeline. |
| Pipeline inference log storage | query | `avg_over_time(sum by(topic) (topic_bytes)[1h:] offset 1h)` | Average inference log storage used over the defined time range for an individual Wallaroo pipeline |

### Total CPU Requested

* Total CPU Requested
* query 
* `sum(wallaroo_kube_pod_resource_requests{resource="cpu"})`
* Number of CPUs requested in the Wallaroo cluster


In [37]:
# this is the URL to get prometheus metrics
query_url = f"{wl.api_endpoint}/v1/metrics/api/v1/query"

# Retrieve the token 
headers = wl.auth.auth_header()

# Get the total size of all the pipeline files by pipeline AND pipeline_version
query = 'sum(wallaroo_kube_pod_resource_requests{resource="cpu"})'

#request parameters
params_rps = {
    'query': query,
}

response = requests.get(query_url, headers=headers, params=params_rps)


if response.status_code == 200:
    print("Query Response:")
    display(response.json())
else:
    print("Failed to fetch query response:", response.status_code, response.text)

Query Response:


{'status': 'success',
 'data': {'resultType': 'vector',
  'result': [{'metric': {}, 'value': [1764703401.406, '9.306000000000001']}]}}

### Total CPU allocated

* Total CPU allocated
* query
* `sum(kube_node_status_capacity{resource="cpu"})`
* Total number of available CPUs in the Wallaroo cluster


In [38]:
# this is the URL to get prometheus metrics
query_url = f"{wl.api_endpoint}/v1/metrics/api/v1/query"

# Retrieve the token 
headers = wl.auth.auth_header()

# Get the total size of all the pipeline files by pipeline AND pipeline_version
query = 'sum(kube_node_status_capacity{resource="cpu"})'

#request parameters
params_rps = {
    'query': query,
}

response = requests.get(query_url, headers=headers, params=params_rps)


if response.status_code == 200:
    print("Query Response:")
    display(response.json())
else:
    print("Failed to fetch query response:", response.status_code, response.text)

Query Response:


{'status': 'success',
 'data': {'resultType': 'vector',
  'result': [{'metric': {}, 'value': [1764703404.434, '48']}]}}

### Total GPU Requested

* Total GPU Requested
* query
* `sum(wallaroo_kube_pod_resource_requests{resource=~"nvidia.com/gpu|qualcomm.com/qaic"})`
* Number of GPUs requested in the Wallaroo cluster


In [39]:
# this is the URL to get prometheus metrics
query_url = f"{wl.api_endpoint}/v1/metrics/api/v1/query"

# Retrieve the token 
headers = wl.auth.auth_header()

# Get the total size of all the pipeline files by pipeline AND pipeline_version
query = 'sum(wallaroo_kube_pod_resource_requests{resource=~"nvidia.com/gpu|qualcomm.com/qaic"})'

#request parameters
params_rps = {
    'query': query,
}

response = requests.get(query_url, headers=headers, params=params_rps)


if response.status_code == 200:
    print("Query Response:")
    display(response.json())
else:
    print("Failed to fetch query response:", response.status_code, response.text)

Query Response:


{'status': 'success',
 'data': {'resultType': 'vector',
  'result': [{'metric': {}, 'value': [1764703407.426, '2']}]}}

### Total GPU Allocated

* Total GPU Allocated
* query
* `sum(kube_node_status_capacity{resource=~"nvidia_com_gpu|qualcomm_com_qaic"})` 
* Total number of available GPUs in the Wallaroo cluster


In [40]:
# this is the URL to get prometheus metrics
query_url = f"{wl.api_endpoint}/v1/metrics/api/v1/query"

# Retrieve the token 
headers = wl.auth.auth_header()

# Get the total size of all the pipeline files by pipeline AND pipeline_version
query = 'sum(kube_node_status_capacity{resource=~"nvidia_com_gpu|qualcomm_com_qaic"})'

#request parameters
params_rps = {
    'query': query,
}

response = requests.get(query_url, headers=headers, params=params_rps)


if response.status_code == 200:
    print("Query Response:")
    display(response.json())
else:
    print("Failed to fetch query response:", response.status_code, response.text)

Query Response:


{'status': 'success',
 'data': {'resultType': 'vector',
  'result': [{'metric': {}, 'value': [1764703410.797, '5']}]}}

### Total Memory Requested

* Total Memory Requested
* query
* `sum(wallaroo_kube_pod_resource_requests{resource="memory"})`
* Amount of memory requested in the Wallaroo cluster.


In [41]:
# this is the URL to get prometheus metrics
query_url = f"{wl.api_endpoint}/v1/metrics/api/v1/query"

# Retrieve the token 
headers = wl.auth.auth_header()

# Get the total size of all the pipeline files by pipeline AND pipeline_version
query = 'sum(wallaroo_kube_pod_resource_requests{resource="memory"})'

#request parameters
params_rps = {
    'query': query,
}

response = requests.get(query_url, headers=headers, params=params_rps)


if response.status_code == 200:
    print("Query Response:")
    display(response.json())
else:
    print("Failed to fetch query response:", response.status_code, response.text)

Query Response:


{'status': 'success',
 'data': {'resultType': 'vector',
  'result': [{'metric': {}, 'value': [1764703413.333, '27791458304']}]}}

### Total Memory Allocated

* Total Memory Allocated
* query
* `sum(kube_node_status_capacity{resource="memory"})`
* Total amount of memory available in the Wallaroo cluster.


In [42]:
# this is the URL to get prometheus metrics
query_url = f"{wl.api_endpoint}/v1/metrics/api/v1/query"

# Retrieve the token 
headers = wl.auth.auth_header()

# Get the total size of all the pipeline files by pipeline AND pipeline_version
query = 'sum(kube_node_status_capacity{resource="memory"})'

#request parameters
params_rps = {
    'query': query,
}

response = requests.get(query_url, headers=headers, params=params_rps)


if response.status_code == 200:
    print("Query Response:")
    display(response.json())
else:
    print("Failed to fetch query response:", response.status_code, response.text)

Query Response:


{'status': 'success',
 'data': {'resultType': 'vector',
  'result': [{'metric': {}, 'value': [1764703414.743, '197850009600']}]}}

### Total Inference Log Storage used

* Total Inference Log Storage used
* query
* `kubelet_volume_stats_used_bytes{persistentvolumeclaim="plateau-managed-disk"}`
* Amount of inference log storage used.



In [43]:
# this is the URL to get prometheus metrics
query_url = f"{wl.api_endpoint}/v1/metrics/api/v1/query"

# Retrieve the token 
headers = wl.auth.auth_header()

# Get the total size of all the pipeline files by pipeline AND pipeline_version
query = 'kubelet_volume_stats_used_bytes{persistentvolumeclaim="plateau-managed-disk"}'

#request parameters
params_rps = {
    'query': query,
}

response = requests.get(query_url, headers=headers, params=params_rps)


if response.status_code == 200:
    print("Query Response:")
    display(response.json())
else:
    print("Failed to fetch query response:", response.status_code, response.text)

Query Response:


{'status': 'success',
 'data': {'resultType': 'vector',
  'result': [{'metric': {'__name__': 'kubelet_volume_stats_used_bytes',
     'beta_kubernetes_io_arch': 'amd64',
     'beta_kubernetes_io_instance_type': 'e2-standard-8',
     'beta_kubernetes_io_os': 'linux',
     'cloud_google_com_gke_boot_disk': 'pd-balanced',
     'cloud_google_com_gke_container_runtime': 'containerd',
     'cloud_google_com_gke_cpu_scaling_level': '8',
     'cloud_google_com_gke_logging_variant': 'DEFAULT',
     'cloud_google_com_gke_max_pods_per_node': '110',
     'cloud_google_com_gke_memory_gb_scaling_level': '32',
     'cloud_google_com_gke_nodepool': 'persistent',
     'cloud_google_com_gke_os_distribution': 'cos',
     'cloud_google_com_gke_provisioning': 'standard',
     'cloud_google_com_gke_stack_type': 'IPV4',
     'cloud_google_com_machine_family': 'e2',
     'cloud_google_com_private_node': 'false',
     'failure_domain_beta_kubernetes_io_region': 'us-central1',
     'failure_domain_beta_kubernete

### Total Inference Log Storage allocated

* Total Inference Log Storage allocated
* query
* `kubelet_volume_stats_capacity_bytes{persistentvolumeclaim="plateau-managed-disk"}`
* Total amount of inference log storage available.


In [44]:
# this is the URL to get prometheus metrics
query_url = f"{wl.api_endpoint}/v1/metrics/api/v1/query"

# Retrieve the token 
headers = wl.auth.auth_header()

# Get the total size of all the pipeline files by pipeline AND pipeline_version
query = 'kubelet_volume_stats_capacity_bytes{persistentvolumeclaim="plateau-managed-disk"}'

#request parameters
params_rps = {
    'query': query,
}

response = requests.get(query_url, headers=headers, params=params_rps)


if response.status_code == 200:
    print("Query Response:")
    display(response.json())
else:
    print("Failed to fetch query response:", response.status_code, response.text)

Query Response:


{'status': 'success',
 'data': {'resultType': 'vector',
  'result': [{'metric': {'__name__': 'kubelet_volume_stats_capacity_bytes',
     'beta_kubernetes_io_arch': 'amd64',
     'beta_kubernetes_io_instance_type': 'e2-standard-8',
     'beta_kubernetes_io_os': 'linux',
     'cloud_google_com_gke_boot_disk': 'pd-balanced',
     'cloud_google_com_gke_container_runtime': 'containerd',
     'cloud_google_com_gke_cpu_scaling_level': '8',
     'cloud_google_com_gke_logging_variant': 'DEFAULT',
     'cloud_google_com_gke_max_pods_per_node': '110',
     'cloud_google_com_gke_memory_gb_scaling_level': '32',
     'cloud_google_com_gke_nodepool': 'persistent',
     'cloud_google_com_gke_os_distribution': 'cos',
     'cloud_google_com_gke_provisioning': 'standard',
     'cloud_google_com_gke_stack_type': 'IPV4',
     'cloud_google_com_machine_family': 'e2',
     'cloud_google_com_private_node': 'false',
     'failure_domain_beta_kubernetes_io_region': 'us-central1',
     'failure_domain_beta_kuber

### Total Artifact Storage used

* Total Artifact Storage used
* query
* `kubelet_volume_stats_capacity_bytes{persistentvolumeclaim="minio"}`
* Amount of model and orchestration artifact storage used.



In [45]:
# this is the URL to get prometheus metrics
query_url = f"{wl.api_endpoint}/v1/metrics/api/v1/query"

# Retrieve the token 
headers = wl.auth.auth_header()

# Get the total size of all the pipeline files by pipeline AND pipeline_version
query = 'kubelet_volume_stats_capacity_bytes{persistentvolumeclaim="minio"}'

#request parameters
params_rps = {
    'query': query,
}

response = requests.get(query_url, headers=headers, params=params_rps)


if response.status_code == 200:
    print("Query Response:")
    display(response.json())
else:
    print("Failed to fetch query response:", response.status_code, response.text)

Query Response:


{'status': 'success',
 'data': {'resultType': 'vector',
  'result': [{'metric': {'__name__': 'kubelet_volume_stats_capacity_bytes',
     'beta_kubernetes_io_arch': 'amd64',
     'beta_kubernetes_io_instance_type': 'e2-standard-8',
     'beta_kubernetes_io_os': 'linux',
     'cloud_google_com_gke_boot_disk': 'pd-balanced',
     'cloud_google_com_gke_container_runtime': 'containerd',
     'cloud_google_com_gke_cpu_scaling_level': '8',
     'cloud_google_com_gke_logging_variant': 'DEFAULT',
     'cloud_google_com_gke_max_pods_per_node': '110',
     'cloud_google_com_gke_memory_gb_scaling_level': '32',
     'cloud_google_com_gke_nodepool': 'persistent',
     'cloud_google_com_gke_os_distribution': 'cos',
     'cloud_google_com_gke_provisioning': 'standard',
     'cloud_google_com_gke_stack_type': 'IPV4',
     'cloud_google_com_machine_family': 'e2',
     'cloud_google_com_private_node': 'false',
     'failure_domain_beta_kubernetes_io_region': 'us-central1',
     'failure_domain_beta_kuber

Total Artifact Storage allocated

* Total Artifact Storage allocated
* query
* `kubelet_volume_stats_used_bytes{persistentvolumeclaim="minio"}`
* Total amount of model and orchestration artifact storage available.


In [46]:
# this is the URL to get prometheus metrics
query_url = f"{wl.api_endpoint}/v1/metrics/api/v1/query"

# Retrieve the token 
headers = wl.auth.auth_header()

# Get the total size of all the pipeline files by pipeline AND pipeline_version
query = 'kubelet_volume_stats_used_bytes{persistentvolumeclaim="minio"}'

#request parameters
params_rps = {
    'query': query,
}

response = requests.get(query_url, headers=headers, params=params_rps)


if response.status_code == 200:
    print("Query Response:")
    display(response.json())
else:
    print("Failed to fetch query response:", response.status_code, response.text)

Query Response:


{'status': 'success',
 'data': {'resultType': 'vector',
  'result': [{'metric': {'__name__': 'kubelet_volume_stats_used_bytes',
     'beta_kubernetes_io_arch': 'amd64',
     'beta_kubernetes_io_instance_type': 'e2-standard-8',
     'beta_kubernetes_io_os': 'linux',
     'cloud_google_com_gke_boot_disk': 'pd-balanced',
     'cloud_google_com_gke_container_runtime': 'containerd',
     'cloud_google_com_gke_cpu_scaling_level': '8',
     'cloud_google_com_gke_logging_variant': 'DEFAULT',
     'cloud_google_com_gke_max_pods_per_node': '110',
     'cloud_google_com_gke_memory_gb_scaling_level': '32',
     'cloud_google_com_gke_nodepool': 'persistent',
     'cloud_google_com_gke_os_distribution': 'cos',
     'cloud_google_com_gke_provisioning': 'standard',
     'cloud_google_com_gke_stack_type': 'IPV4',
     'cloud_google_com_machine_family': 'e2',
     'cloud_google_com_private_node': 'false',
     'failure_domain_beta_kubernetes_io_region': 'us-central1',
     'failure_domain_beta_kubernete

### Average GPU usage over time

* Average GPU usage over time
* Endpoint: `query`
* `avg_over_time(sum(wallaroo_kube_pod_resource_requests{resource=~"nvidia.com/gpu|qualcomm.com/qaic"})[1h:] offset 1h)`
* Average GPU usage over the defined time range in the Wallaroo cluster.


In [None]:
# this is the URL to get prometheus metrics
query_url = f"{wl.api_endpoint}/v1/metrics/api/v1/query"

# Retrieve the token 
headers = wl.auth.auth_header()

# Get the total size of all the pipeline files by pipeline AND pipeline_version
query = 'avg_over_time(sum(wallaroo_kube_pod_resource_requests{resource=~"nvidia.com/gpu|qualcomm.com/qaic"})[1h:] offset 1h)'

step = "5m" # the step of the calculation
# this will also format the timezone in the parsing section
timezone = "US/Mountain"
selected_timezone = pytz.timezone(timezone)

data_start = selected_timezone.localize(datetime.datetime(2025, 11, 24, 13, 0, 0))
data_end = selected_timezone.localize(datetime.datetime(2025, 11, 25, 15, 59, 59))

#request parameters
params_rps = {
    'query': query,
    'start': start_timestamp,
    'end': end_timestamp,
    'step': step
}

response = requests.get(query_url, headers=headers, params=params_rps)


if response.status_code == 200:
    print("Query Response:")
    display(response.json())
else:
    print("Failed to fetch query response:", response.status_code, response.text)

Query Response:


{'status': 'success',
 'data': {'resultType': 'vector',
  'result': [{'metric': {}, 'value': [1764703472.026, '2']}]}}

### Average GPU requested over time

* Average GPU requested over time
* query
* `avg_over_time(sum(wallaroo_kube_pod_resource_requests{resource=~"nvidia.com/gpu|qualcomm.com/qaic"})[1h:] offset 1h)`
* Average number of GPU requested over the defined time range in the Wallaroo cluster


In [None]:
# this is the URL to get prometheus metrics
query_url = f"{wl.api_endpoint}/v1/metrics/api/v1/query"

# Retrieve the token 
headers = wl.auth.auth_header()

step = "5m" # the step of the calculation
# this will also format the timezone in the parsing section
timezone = "US/Mountain"
selected_timezone = pytz.timezone(timezone)
# Define the start and end times

data_start = selected_timezone.localize(datetime.datetime(2025, 11, 24, 13, 0, 0))
data_end = selected_timezone.localize(datetime.datetime(2025, 11, 25, 15, 59, 59))

# Convert to UTC and get the Unix timestamps

start_timestamp = int(data_start.astimezone(pytz.UTC).timestamp())
end_timestamp = int(data_end.astimezone(pytz.UTC).timestamp())

# Get the total size of all the pipeline files by pipeline AND pipeline_version
query = 'avg_over_time(sum(wallaroo_kube_pod_resource_requests{resource=~"nvidia.com/gpu|qualcomm.com/qaic"})[1h:] offset 1h)'

#request parameters
params_rps = {
    'query': query,
    'start': start_timestamp,
    'end': end_timestamp,
    'step': step
}

response = requests.get(query_url, headers=headers, params=params_rps)


if response.status_code == 200:
    print("Query Response:")
    display(response.json())
else:
    print("Failed to fetch query response:", response.status_code, response.text)

Query Response:


{'status': 'success',
 'data': {'resultType': 'vector',
  'result': [{'metric': {}, 'value': [1764703487.046, '2']}]}}

### Average CPU usage over time

* Average CPU usage over time
* query
* `avg_over_time(sum(wallaroo_kube_pod_resource_usage{resource="cpu"})[1h:] offset 1h)`
* Average CPU usage over the defined time range in the Wallaroo cluster.


In [None]:
# this is the URL to get prometheus metrics
query_url = f"{wl.api_endpoint}/v1/metrics/api/v1/query"

# Retrieve the token 
headers = wl.auth.auth_header()

step = "5m" # the step of the calculation
# this will also format the timezone in the parsing section
timezone = "US/Mountain"
selected_timezone = pytz.timezone(timezone)
# Define the start and end times

data_start = selected_timezone.localize(datetime.datetime(2025, 11, 24, 13, 0, 0))
data_end = selected_timezone.localize(datetime.datetime(2025, 11, 25, 15, 59, 59))

# Convert to UTC and get the Unix timestamps

start_timestamp = int(data_start.astimezone(pytz.UTC).timestamp())
end_timestamp = int(data_end.astimezone(pytz.UTC).timestamp())

# Get the total size of all the pipeline files by pipeline AND pipeline_version
query = 'avg_over_time(sum(wallaroo_kube_pod_resource_usage{resource="cpu"})[1h:] offset 1h)'

#request parameters
params_rps = {
    'query': query,
    'start': start_timestamp,
    'end': end_timestamp,
    'step': step
}

response = requests.get(query_url, headers=headers, params=params_rps)


if response.status_code == 200:
    print("Query Response:")
    display(response.json())
else:
    print("Failed to fetch query response:", response.status_code, response.text)

Query Response:


{'status': 'success',
 'data': {'resultType': 'vector',
  'result': [{'metric': {},
    'value': [1764703508.368, '0.13975404652444443']}]}}

### Average CPU requested over time

* Average CPU requested over time
* query
* `avg_over_time(sum(wallaroo_kube_pod_resource_requests{resource="cpu"})[1h:] offset 1h)`
* Average CPU requests over the defined time range in the Wallaroo cluster


In [None]:
# this is the URL to get prometheus metrics
query_url = f"{wl.api_endpoint}/v1/metrics/api/v1/query"

# Retrieve the token 
headers = wl.auth.auth_header()

step = "5m" # the step of the calculation
# this will also format the timezone in the parsing section
timezone = "US/Mountain"
selected_timezone = pytz.timezone(timezone)
# Define the start and end times

data_start = selected_timezone.localize(datetime.datetime(2025, 11, 24, 13, 0, 0))
data_end = selected_timezone.localize(datetime.datetime(2025, 11, 25, 15, 59, 59))

# Convert to UTC and get the Unix timestamps

start_timestamp = int(data_start.astimezone(pytz.UTC).timestamp())
end_timestamp = int(data_end.astimezone(pytz.UTC).timestamp())

# Get the total size of all the pipeline files by pipeline AND pipeline_version
query = 'avg_over_time(sum(wallaroo_kube_pod_resource_requests{resource="cpu"})[1h:] offset 1h)'

#request parameters
params_rps = {
    'query': query,
    'start': start_timestamp,
    'end': end_timestamp,
    'step': step
}

response = requests.get(query_url, headers=headers, params=params_rps)


if response.status_code == 200:
    print("Query Response:")
    display(response.json())
else:
    print("Failed to fetch query response:", response.status_code, response.text)

Query Response:


{'status': 'success',
 'data': {'resultType': 'vector',
  'result': [{'metric': {}, 'value': [1764703514.524, '8.313222222222224']}]}}

* Average Memory usage over time

* Average Memory usage over time
* query
* `avg_over_time(sum(wallaroo_kube_pod_resource_usage{resource="memory"})[1h:] offset 1h)`
* Average memory usage over the defined time range in the Wallaroo cluster.


In [68]:
# this is the URL to get prometheus metrics
query_url = f"{wl.api_endpoint}/v1/metrics/api/v1/query"

# Retrieve the token 
headers = wl.auth.auth_header()

step = "5m" # the step of the calculation
# this will also format the timezone in the parsing section
timezone = "US/Mountain"
selected_timezone = pytz.timezone(timezone)
# Define the start and end times

data_start = selected_timezone.localize(datetime.datetime(2025, 11, 24, 13, 0, 0))
data_end = selected_timezone.localize(datetime.datetime(2025, 11, 25, 15, 59, 59))

# Convert to UTC and get the Unix timestamps

start_timestamp = int(data_start.astimezone(pytz.UTC).timestamp())
end_timestamp = int(data_end.astimezone(pytz.UTC).timestamp())

# Get the total size of all the pipeline files by pipeline AND pipeline_version
query = 'avg_over_time(sum(wallaroo_kube_pod_resource_usage{resource="memory"})[1h:] offset 1h)'

#request parameters
params_rps = {
    'query': query,
    'start': start_timestamp,
    'end': end_timestamp,
    'step': step
}

response = requests.get(query_url, headers=headers, params=params_rps)


if response.status_code == 200:
    print("Query Response:")
    display(response.json())
else:
    print("Failed to fetch query response:", response.status_code, response.text)

Query Response:


{'status': 'success',
 'data': {'resultType': 'vector',
  'result': [{'metric': {}, 'value': [1764709762.838, '14991024310.044445']}]}}

### Average Memory requests over time

* Average Memory requests over time
* query
* `avg_over_time(sum(wallaroo_kube_pod_resource_requests{resource="memory"})[1h:] offset 1h)`
* Average memory requests over the defined time range in the Wallaroo cluster.


In [None]:
# this is the URL to get prometheus metrics
query_url = f"{wl.api_endpoint}/v1/metrics/api/v1/query"

# Retrieve the token 
headers = wl.auth.auth_header()

step = "5m" # the step of the calculation
# this will also format the timezone in the parsing section
timezone = "US/Mountain"
selected_timezone = pytz.timezone(timezone)
# Define the start and end times

data_start = selected_timezone.localize(datetime.datetime(2025, 11, 24, 13, 0, 0))
data_end = selected_timezone.localize(datetime.datetime(2025, 11, 25, 15, 59, 59))

# Convert to UTC and get the Unix timestamps

start_timestamp = int(data_start.astimezone(pytz.UTC).timestamp())
end_timestamp = int(data_end.astimezone(pytz.UTC).timestamp())

# Get the total size of all the pipeline files by pipeline AND pipeline_version
query = 'avg_over_time(sum(wallaroo_kube_pod_resource_requests{resource="memory"})[1h:] offset 1h)'

#request parameters
params_rps = {
    'query': query,
    'start': start_timestamp,
    'end': end_timestamp,
    'step': step
}

response = requests.get(query_url, headers=headers, params=params_rps)


if response.status_code == 200:
    print("Query Response:")
    display(response.json())
else:
    print("Failed to fetch query response:", response.status_code, response.text)

Query Response:


{'status': 'success',
 'data': {'resultType': 'vector',
  'result': [{'metric': {}, 'value': [1764703525.098, '25661273829.831112']}]}}

### Average pipelines CPU usage over time

* Average pipelines CPU usage over time
* query
* `avg_over_time(sum by(namespace)(wallaroo_kube_pod_resource_usage{resource="cpu"})[1h:] offset 1h)`
* Average CPU usage over the defined time range for an individual Wallaroo pipeline.


In [64]:
# this is the URL to get prometheus metrics
query_url = f"{wl.api_endpoint}/v1/metrics/api/v1/query"

# Retrieve the token 
headers = wl.auth.auth_header()

step = "5m" # the step of the calculation
# this will also format the timezone in the parsing section
timezone = "US/Mountain"
selected_timezone = pytz.timezone(timezone)
# Define the start and end times

data_start = selected_timezone.localize(datetime.datetime(2025, 11, 24, 13, 0, 0))
data_end = selected_timezone.localize(datetime.datetime(2025, 11, 25, 15, 59, 59))

# Convert to UTC and get the Unix timestamps

start_timestamp = int(data_start.astimezone(pytz.UTC).timestamp())
end_timestamp = int(data_end.astimezone(pytz.UTC).timestamp())

# Get the total size of all the pipeline files by pipeline AND pipeline_version
query = 'avg_over_time(sum by(namespace)(wallaroo_kube_pod_resource_usage{resource="cpu"})[1h:] offset 1h)'

#request parameters
params_rps = {
    'query': query,
    'start': start_timestamp,
    'end': end_timestamp,
    'step': step
}

response = requests.get(query_url, headers=headers, params=params_rps)


if response.status_code == 200:
    print("Query Response:")
    display(response.json())
else:
    print("Failed to fetch query response:", response.status_code, response.text)

Query Response:


{'status': 'success',
 'data': {'resultType': 'vector',
  'result': [{'metric': {'namespace': 'tinyllama-openai-414'},
    'value': [1764709510.656, '0.014533953894444444']},
   {'metric': {'namespace': 'wallaroo'},
    'value': [1764709510.656, '0.2700718789166666']},
   {'metric': {'namespace': 'whisper-hf-byop-jcw-48'},
    'value': [1764709510.656, '0.009850870877777779']},
   {'metric': {'namespace': 'whisper-hf-byop-replicatest-53'},
    'value': [1764709510.656, '0.008777500291666665']}]}}

### Average pipelines CPU requested over time

* Average pipelines CPU requested over time
* query
* `avg_over_time(sum by(namespace)(wallaroo_kube_pod_resource_requests{resource="cpu"})[1h:] offset 1h)`
* Average number of CPUs requested over the defined time range for an individual Wallaroo pipeline.


In [66]:
# this is the URL to get prometheus metrics
query_url = f"{wl.api_endpoint}/v1/metrics/api/v1/query"

# Retrieve the token 
headers = wl.auth.auth_header()

step = "5m" # the step of the calculation
# this will also format the timezone in the parsing section
timezone = "US/Mountain"
selected_timezone = pytz.timezone(timezone)
# Define the start and end times

data_start = selected_timezone.localize(datetime.datetime(2025, 11, 2, 13, 0, 0))
data_end = selected_timezone.localize(datetime.datetime(2025, 11, 25, 15, 59, 59))

# Convert to UTC and get the Unix timestamps

start_timestamp = int(data_start.astimezone(pytz.UTC).timestamp())
end_timestamp = int(data_end.astimezone(pytz.UTC).timestamp())

# Get the total size of all the pipeline files by pipeline AND pipeline_version
query = 'avg_over_time(sum by(namespace)(wallaroo_kube_pod_resource_requests{resource="cpu"})[1h:] offset 1h)'

#request parameters
params_rps = {
    'query': query,
    'start': start_timestamp,
    'end': end_timestamp,
    'step': step
}

response = requests.get(query_url, headers=headers, params=params_rps)


if response.status_code == 200:
    print("Query Response:")
    display(response.json())
else:
    print("Failed to fetch query response:", response.status_code, response.text)

Query Response:


{'status': 'success',
 'data': {'resultType': 'vector',
  'result': [{'metric': {'namespace': 'tinyllama-openai-414'},
    'value': [1764709573.525, '1.6']},
   {'metric': {'namespace': 'wallaroo'},
    'value': [1764709573.525, '3.2560000000000002']},
   {'metric': {'namespace': 'whisper-hf-byop-jcw-48'},
    'value': [1764709573.525, '4.35']},
   {'metric': {'namespace': 'whisper-hf-byop-replicatest-53'},
    'value': [1764709573.525, '0.1']}]}}

### Average pipelines GPU usage over time

* Average pipelines GPU usage over time
* query
* `avg_over_time(sum by(namespace)(wallaroo_kube_pod_resource_requests{resource=~"nvidia.com/gpu\|qualcomm.com/qaic"})[1h:] offset 1h)`
* Average GPU usage over the defined time range for an individual Wallaroo pipeline.



In [None]:
# this is the URL to get prometheus metrics
query_url = f"{wl.api_endpoint}/v1/metrics/api/v1/query"

# Retrieve the token 
headers = wl.auth.auth_header()

step = "5m" # the step of the calculation
# this will also format the timezone in the parsing section
timezone = "US/Mountain"
selected_timezone = pytz.timezone(timezone)
# Define the start and end times

data_start = selected_timezone.localize(datetime.datetime(2025, 11, 24, 13, 0, 0))
data_end = selected_timezone.localize(datetime.datetime(2025, 11, 25, 15, 59, 59))

# Convert to UTC and get the Unix timestamps

start_timestamp = int(data_start.astimezone(pytz.UTC).timestamp())
end_timestamp = int(data_end.astimezone(pytz.UTC).timestamp())

# Get the total size of all the pipeline files by pipeline AND pipeline_version
query = 'avg_over_time(sum by(namespace)(wallaroo_kube_pod_resource_requests{resource=~"nvidia.com/gpu|qualcomm.com/qaic"})[1h:] offset 1h)'

#request parameters
params_rps = {
    'query': query,
    'start': start_timestamp,
    'end': end_timestamp,
    'step': step
}

response = requests.get(query_url, headers=headers, params=params_rps)


if response.status_code == 200:
    print("Query Response:")
    display(response.json())
else:
    print("Failed to fetch query response:", response.status_code, response.text)

Query Response:


{'status': 'success',
 'data': {'resultType': 'vector',
  'result': [{'metric': {'namespace': 'tinyllama-openai-414'},
    'value': [1764703545.245, '1']},
   {'metric': {'namespace': 'whisper-hf-byop-jcw-48'},
    'value': [1764703545.245, '1']}]}}

### Average pipelines GPU requested over time

* Average pipelines GPU requested over time
* query
* `avg_over_time(sum by(namespace)(wallaroo_kube_pod_resource_requests{resource=~"nvidia.com/gpu|qualcomm.com/qaic"})[1h:] offset 1h)`
* Average number of GPUs requested over the defined time range for an individual Wallaroo pipeline. |


In [None]:
# this is the URL to get prometheus metrics
query_url = f"{wl.api_endpoint}/v1/metrics/api/v1/query"

# Retrieve the token 
headers = wl.auth.auth_header()

step = "5m" # the step of the calculation
# this will also format the timezone in the parsing section
timezone = "US/Mountain"
selected_timezone = pytz.timezone(timezone)
# Define the start and end times

data_start = selected_timezone.localize(datetime.datetime(2025, 11, 24, 13, 0, 0))
data_end = selected_timezone.localize(datetime.datetime(2025, 11, 25, 15, 59, 59))

# Convert to UTC and get the Unix timestamps

start_timestamp = int(data_start.astimezone(pytz.UTC).timestamp())
end_timestamp = int(data_end.astimezone(pytz.UTC).timestamp())

# Get the total size of all the pipeline files by pipeline AND pipeline_version
query = 'avg_over_time(sum by(namespace)(wallaroo_kube_pod_resource_requests{resource=~"nvidia.com/gpu|qualcomm.com/qaic"})[1h:] offset 1h)'

#request parameters
params_rps = {
    'query': query,
    'start': start_timestamp,
    'end': end_timestamp,
    'step': step
}

response = requests.get(query_url, headers=headers, params=params_rps)


if response.status_code == 200:
    print("Query Response:")
    display(response.json())
else:
    print("Failed to fetch query response:", response.status_code, response.text)

Query Response:


{'status': 'success',
 'data': {'resultType': 'vector',
  'result': [{'metric': {'namespace': 'tinyllama-openai-414'},
    'value': [1764703550.567, '1']},
   {'metric': {'namespace': 'whisper-hf-byop-jcw-48'},
    'value': [1764703550.567, '1']}]}}

### Average pipelines Mem usage over time

* Average pipelines Mem usage over time
* query
* `avg_over_time(sum by(namespace) (wallaroo_kube_pod_resource_usage{resource="memory"})[1h:] offset 1h)`
* Average memory usage over the defined time range for an individual Wallaroo pipeline. |


In [None]:
# this is the URL to get prometheus metrics
query_url = f"{wl.api_endpoint}/v1/metrics/api/v1/query"

# Retrieve the token 
headers = wl.auth.auth_header()

step = "5m" # the step of the calculation
# this will also format the timezone in the parsing section
timezone = "US/Mountain"
selected_timezone = pytz.timezone(timezone)
# Define the start and end times

data_start = selected_timezone.localize(datetime.datetime(2025, 11, 24, 13, 0, 0))
data_end = selected_timezone.localize(datetime.datetime(2025, 11, 25, 15, 59, 59))

# Convert to UTC and get the Unix timestamps

start_timestamp = int(data_start.astimezone(pytz.UTC).timestamp())
end_timestamp = int(data_end.astimezone(pytz.UTC).timestamp())

# Get the total size of all the pipeline files by pipeline AND pipeline_version
query = 'avg_over_time(sum by(namespace) (wallaroo_kube_pod_resource_usage{resource="memory"})[1h:] offset 1h)'

#request parameters
params_rps = {
    'query': query,
    'start': start_timestamp,
    'end': end_timestamp,
    'step': step
}

response = requests.get(query_url, headers=headers, params=params_rps)


if response.status_code == 200:
    print("Query Response:")
    display(response.json())
else:
    print("Failed to fetch query response:", response.status_code, response.text)

Query Response:


{'status': 'success',
 'data': {'resultType': 'vector',
  'result': [{'metric': {'namespace': 'tinyllama-openai-414'},
    'value': [1764703563.03, '7787741443.413333']},
   {'metric': {'namespace': 'wallaroo'},
    'value': [1764703563.03, '5640514273.28']},
   {'metric': {'namespace': 'whisper-hf-byop-jcw-48'},
    'value': [1764703563.03, '920613997.2266667']},
   {'metric': {'namespace': 'whisper-hf-byop-replicatest-53'},
    'value': [1764703563.03, '15823080.106666667']}]}}

### Average pipelines Mem requested over time

* Average pipelines Mem requested over time
* query
* `avg_over_time(sum by (namespace)(wallaroo_kube_pod_resource_requests{resource="memory"})[1h:] offset 1h)`
* Average amount of memory requested over the defined time range for an individual Wallaroo pipeline.


In [None]:
# this is the URL to get prometheus metrics
query_url = f"{wl.api_endpoint}/v1/metrics/api/v1/query"

# Retrieve the token 
headers = wl.auth.auth_header()

step = "5m" # the step of the calculation
# this will also format the timezone in the parsing section
timezone = "US/Mountain"
selected_timezone = pytz.timezone(timezone)
# Define the start and end times

data_start = selected_timezone.localize(datetime.datetime(2025, 11, 24, 13, 0, 0))
data_end = selected_timezone.localize(datetime.datetime(2025, 11, 25, 15, 59, 59))

# Convert to UTC and get the Unix timestamps

start_timestamp = int(data_start.astimezone(pytz.UTC).timestamp())
end_timestamp = int(data_end.astimezone(pytz.UTC).timestamp())

# Get the total size of all the pipeline files by pipeline AND pipeline_version
query = 'avg_over_time(sum by (namespace)(wallaroo_kube_pod_resource_requests{resource="memory"})[1h:] offset 1h)'

#request parameters
params_rps = {
    'query': query,
    'start': start_timestamp,
    'end': end_timestamp,
    'step': step
}

response = requests.get(query_url, headers=headers, params=params_rps)


if response.status_code == 200:
    print("Query Response:")
    display(response.json())
else:
    print("Failed to fetch query response:", response.status_code, response.text)

Query Response:


{'status': 'success',
 'data': {'resultType': 'vector',
  'result': [{'metric': {'namespace': 'tinyllama-openai-414'},
    'value': [1764703571.307, '9797894144']},
   {'metric': {'namespace': 'wallaroo'},
    'value': [1764703571.307, '5936636522.951111']},
   {'metric': {'namespace': 'whisper-hf-byop-jcw-48'},
    'value': [1764703571.307, '9797894144']},
   {'metric': {'namespace': 'whisper-hf-byop-replicatest-53'},
    'value': [1764703571.307, '134217728']}]}}

### Pipeline inference log storage

* Pipeline inference log storage
* query
* `avg_over_time(sum by(topic) (topic_bytes)[1h:] offset 1h)`
* Average inference log storage used over the defined time range for an individual Wallaroo pipeline

In [None]:
# this is the URL to get prometheus metrics
query_url = f"{wl.api_endpoint}/v1/metrics/api/v1/query"

# Retrieve the token 
headers = wl.auth.auth_header()

step = "5m" # the step of the calculation
# this will also format the timezone in the parsing section
timezone = "US/Mountain"
selected_timezone = pytz.timezone(timezone)
# Define the start and end times

data_start = selected_timezone.localize(datetime.datetime(2025, 11, 24, 13, 0, 0))
data_end = selected_timezone.localize(datetime.datetime(2025, 11, 25, 15, 59, 59))

# Convert to UTC and get the Unix timestamps

start_timestamp = int(data_start.astimezone(pytz.UTC).timestamp())
end_timestamp = int(data_end.astimezone(pytz.UTC).timestamp())

# Get the total size of all the pipeline files by pipeline AND pipeline_version
query = 'avg_over_time(sum by(topic) (topic_bytes)[1h:] offset 1h)'

#request parameters
params_rps = {
    'query': query,
    'start': start_timestamp,
    'end': end_timestamp,
    'step': step
}

response = requests.get(query_url, headers=headers, params=params_rps)


if response.status_code == 200:
    print("Query Response:")
    display(response.json())
else:
    print("Failed to fetch query response:", response.status_code, response.text)

Query Response:


{'status': 'success',
 'data': {'resultType': 'vector',
  'result': [{'metric': {'topic': 'workspace-1444-pipeline-dlrm-click-prediction-inference'},
    'value': [1764703578.734, '4282479']},
   {'metric': {'topic': 'workspace-1526-pipeline-ma-consumptionchanges-stage-inference'},
    'value': [1764703578.734, '1148670']},
   {'metric': {'topic': 'workspace-1529-pipeline-rum-assay-nan-jcw-inference'},
    'value': [1764703578.734, '2041174']},
   {'metric': {'topic': 'workspace-42-pipeline-retail-inv-tracker-edge-obs-inference'},
    'value': [1764703578.734, '3500604']},
   {'metric': {'topic': 'workspace-71-pipeline-assay-demonstration-tutorial-jcw-inference'},
    'value': [1764703578.734, '60325542']},
   {'metric': {'topic': 'workspace-86-pipeline-house-price-predictor-drift-inference'},
    'value': [1764703578.734, '77898428']}]}}