# EOEPCA Resource Health Validation and Usage Notebook

## Introduction

The **Resource Health Building Block** provides a flexible framework for monitoring the health and status of resources within a platform. This includes core platform services as well as derived or user-provided resources such as datasets, workflows or user applications.

## Setup

In [1]:
import os
import requests
import json
from pathlib import Path

import sys
sys.path.append('../')
from modules.helpers import get_access_token, load_eoepca_state, test_cell, test_results

Load `eoepca state` environment

In [2]:
load_eoepca_state()

In [3]:
platform_domain = os.environ.get("INGRESS_HOST")
http_scheme = os.environ.get("HTTP_SCHEME", "https")
resource_health_domain = f'{http_scheme}://resource-health.{platform_domain}'

# If you have a self-signed CA certificate for your cluster then this can be useful to avoid TLS errors below.
os.environ["REQUESTS_CA_BUNDLE"] = "/etc/ssl/certs/ca-certificates.crt"
verify_tls = True

print(f"Resource Health URL: {resource_health_domain}")

Resource Health URL: https://resource-health.test.eoepca.org


## Authentication Setup

The Resource Health API requires OIDC authentication. We'll obtain an access token from Keycloak.

In [4]:
def get_auth_headers():
    """Get authentication headers for Resource Health API requests."""
    tokens = get_access_token(
        username=os.environ.get("KEYCLOAK_TEST_USER"),
        password=os.environ.get("KEYCLOAK_TEST_PASSWORD"),
        client_id="resource-health",
        client_secret=os.environ.get("RESOURCE_HEALTH_CLIENT_SECRET")
    )
    return {
        "Authorization": f"Bearer {tokens}",
        "Content-Type": "application/vnd.api+json",
    }

# Test authentication
try:
    auth_headers = get_auth_headers()
    print("✅ Successfully obtained access token")
except Exception as e:
    print(f"❌ Failed to obtain access token: {e}")
    print("Please ensure KEYCLOAK_TEST_USER, KEYCLOAK_TEST_PASSWORD, and RESOURCE_HEALTH_CLIENT_SECRET are set.")

✅ Successfully obtained access token


## Validate Resource Health Endpoints

Let's verify the Resource Health services are accessible.

In [5]:
# Endpoints that don't require authentication (health/readiness probes)
public_endpoints = [
    ("Health Checks API Docs", f"{resource_health_domain}/api/healthchecks/docs"),
    ("Telemetry API Docs", f"{resource_health_domain}/api/telemetry/docs"),
]

# Endpoints that require authentication
auth_endpoints = [
    ("Check Templates", f"{resource_health_domain}/api/healthchecks/v1/check_templates/"),
    ("Checks List", f"{resource_health_domain}/api/healthchecks/v1/checks/"),
]

print("Validating Resource Health endpoints...\n")

print("Public endpoints:")
for name, url in public_endpoints:
    try:
        response = requests.get(url, verify=verify_tls, timeout=10)
        status = "✅" if response.status_code == 200 else "❌"
        print(f"  {status} {name}: {response.status_code}")
    except requests.exceptions.RequestException as e:
        print(f"  ❌ {name}: Connection error - {e}")

print("\nAuthenticated endpoints:")
auth_headers = get_auth_headers()
for name, url in auth_endpoints:
    try:
        response = requests.get(url, headers=auth_headers, verify=verify_tls, timeout=10)
        status = "✅" if response.status_code == 200 else "❌"
        print(f"  {status} {name}: {response.status_code}")
    except requests.exceptions.RequestException as e:
        print(f"  ❌ {name}: Connection error - {e}")

Validating Resource Health endpoints...

Public endpoints:
  ✅ Health Checks API Docs: 200
  ✅ Telemetry API Docs: 200

Authenticated endpoints:
  ✅ Check Templates: 200
  ✅ Checks List: 200


## Understanding Health Check Templates

Health check templates define reusable patterns for common monitoring scenarios. The default deployment includes:

- **simple_ping** - Checks if an endpoint responds with an expected HTTP status code
- **generic_script_template** - Runs custom pytest scripts for advanced health checks
- **telemetry_access_template** - Runs scripts with access to telemetry data

In [6]:
templates_url = f"{resource_health_domain}/api/healthchecks/v1/check_templates/"
auth_headers = get_auth_headers()

response = requests.get(templates_url, headers=auth_headers, verify=verify_tls)

if response.status_code == 200:
    data = response.json()
    templates = data.get('data', [])
    print(f"Found {len(templates)} health check template(s):\n")
    
    for template in templates:
        template_id = template.get('id')
        attrs = template.get('attributes', {})
        metadata = attrs.get('metadata', {})
        
        print(f"Template: {template_id}")
        print(f"  Label: {metadata.get('label', 'N/A')}")
        print(f"  Description: {metadata.get('description', 'N/A')}")
        print()
else:
    print(f"Error fetching templates: {response.status_code}")
    print(response.text)

Found 3 health check template(s):

Template: simple_ping
  Label: Simple ping template
  Description: Simple template with preset script for pinging single endpoint.

Template: generic_script_template
  Label: Generic script template
  Description: Runs a user-provided pytest script from a specified remote or data url

Template: telemetry_access_template
  Label: Script with telemetry access
  Description: Health check template that runs a userscript with telemetry access on localhost:8080



### Examine a Template in Detail

Let's look at one of the templates in detail to understand its configuration options.

In [7]:
# Get the first available template
templates_url = f"{resource_health_domain}/api/healthchecks/v1/check_templates/"
auth_headers = get_auth_headers()

response = requests.get(templates_url, headers=auth_headers, verify=verify_tls)

if response.status_code == 200:
    data = response.json()
    templates = data.get('data', [])
    
    if templates:
        template_id = templates[0].get('id')
        template_url = f"{resource_health_domain}/api/healthchecks/v1/check_templates/{template_id}"
        
        detail_response = requests.get(template_url, headers=auth_headers, verify=verify_tls)
        
        if detail_response.status_code == 200:
            template = detail_response.json()
            print(f"Template Details: {template_id}")
            print(json.dumps(template, indent=2))
        else:
            print(f"Error fetching template details: {detail_response.status_code}")
    else:
        print("No templates available")
else:
    print(f"Error: {response.status_code}")

Template Details: simple_ping
{
  "data": {
    "id": "simple_ping",
    "type": "check_template",
    "attributes": {
      "metadata": {
        "label": "Simple ping template",
        "description": "Simple template with preset script for pinging single endpoint."
      },
      "arguments": {
        "additionalProperties": false,
        "properties": {
          "endpoint": {
            "format": "textarea",
            "title": "Endpoint",
            "type": "string"
          },
          "expected_status_code": {
            "default": 200,
            "exclusiveMaximum": 600,
            "minimum": 100,
            "title": "Expected Status Code",
            "type": "integer"
          }
        },
        "required": [
          "endpoint"
        ],
        "title": "SimplePingArguments",
        "type": "object",
        "$schema": "http://json-schema.org/draft-07/schema"
      }
    },
    "links": {
      "self": "https://resource-health.develop.eoepca.org/api/health

## List Existing Health Checks

Let's see what health checks are currently registered in the system.

In [8]:
checks_url = f"{resource_health_domain}/api/healthchecks/v1/checks/"
auth_headers = get_auth_headers()

response = requests.get(checks_url, headers=auth_headers, verify=verify_tls)

if response.status_code == 200:
    data = response.json()
    checks = data.get('data', [])
    
    if checks:
        print(f"Found {len(checks)} health check(s):\n")
        for check in checks:
            check_id = check.get('id')
            attrs = check.get('attributes', {})
            metadata = attrs.get('metadata', {})
            
            print(f"Check ID: {check_id}")
            print(f"  Name: {metadata.get('name', 'N/A')}")
            print(f"  Description: {metadata.get('description', 'N/A')}")
            print(f"  Schedule: {attrs.get('schedule', 'N/A')}")
            print(f"  Template: {metadata.get('template_id', 'N/A')}")
            print()
    else:
        print("No health checks registered yet.")
else:
    print(f"Error fetching checks: {response.status_code}")
    print(response.text)

Found 1 health check(s):

Check ID: 5404e7bb-0aca-4bac-903f-6e10afa8334b
  Name: google-ping-check
  Description: Check if Google is reachable
  Schedule: */5 * * * *
  Template: simple_ping



## Create a Health Check

The Resource Health API uses [JSON:API](https://jsonapi.org/) format. Let's create a health check using the generic_script_template.

**Note:** The available templates depend on your deployment configuration. We'll use `generic_script_template` which runs a custom pytest script.

In [9]:
# Define a health check using the generic script template
# This script pings httpbin.org and verifies the response

ping_script = '''
import requests

def test_httpbin_ping():
    """Check if httpbin.org is responding with status 200."""
    response = requests.get("https://httpbin.org/status/200")
    assert response.status_code == 200, f"Expected 200, got {response.status_code}"
'''

healthcheck_payload = {
  "data": {
    "type": "check",
    "attributes": {
      "schedule": "*/1 * * * *",
      "metadata": {
        "name": "google-ping-check",
        "description": "Check if Google is reachable",
        "template_id": "simple_ping",
        "template_args": {
          "endpoint": "https://www.google.com",
          "expected_status_code": 200
        }
      }
    }
  }
}


print("Health check definition:")
print(json.dumps(healthcheck_payload, indent=2))

Health check definition:
{
  "data": {
    "type": "check",
    "attributes": {
      "schedule": "*/1 * * * *",
      "metadata": {
        "name": "google-ping-check",
        "description": "Check if Google is reachable",
        "template_id": "simple_ping",
        "template_args": {
          "endpoint": "https://www.google.com",
          "expected_status_code": 200
        }
      }
    }
  }
}


In [10]:
# Register the health check
checks_url = f"{resource_health_domain}/api/healthchecks/v1/checks/"
auth_headers = get_auth_headers()
response = requests.post(
    checks_url,
    headers=auth_headers,
    json=healthcheck_payload,
    verify=verify_tls
)

if response.status_code in [200, 201]:
    result = response.json()
    check_id = result.get('data', {}).get('id')
    print(f"✅ Health check created successfully!")
    print(f"   Check ID: {check_id}")
    print(f"\nFull response:")
    print(json.dumps(result, indent=2))
else:
    print(f"❌ Failed to create health check: {response.status_code}")
    print(response.text)

✅ Health check created successfully!
   Check ID: ce012199-0c28-4e9c-9b00-c200f76adef0

Full response:
{
  "data": {
    "id": "ce012199-0c28-4e9c-9b00-c200f76adef0",
    "type": "check",
    "attributes": {
      "metadata": {
        "name": "google-ping-check",
        "description": "Check if Google is reachable",
        "template_id": "simple_ping",
        "template_args": {
          "endpoint": "https://www.google.com",
          "expected_status_code": 200
        }
      },
      "schedule": "*/1 * * * *",
      "outcome_filter": {
        "resource_attributes": {
          "k8s.cronjob.name": [
            "ce012199-0c28-4e9c-9b00-c200f76adef0"
          ]
        }
      }
    },
    "links": {
      "self": "https://resource-health.develop.eoepca.org/api/healthchecks/v1/checks/ce012199-0c28-4e9c-9b00-c200f76adef0",
      "check_template": "https://resource-health.develop.eoepca.org/api/healthchecks/v1/check_templates/simple_ping"
    }
  },
  "links": {
    "root": "https

## Verify the Health Check was Created

Let's confirm our health check is now registered.

In [11]:
checks_url = f"{resource_health_domain}/api/healthchecks/v1/checks/"
auth_headers = get_auth_headers()

response = requests.get(checks_url, headers=auth_headers, verify=verify_tls)

if response.status_code == 200:
    data = response.json()
    checks = data.get('data', [])
    
    print(f"Registered health checks ({len(checks)}):\n")
    for check in checks:
        check_id = check.get('id')
        attrs = check.get('attributes', {})
        metadata = attrs.get('metadata', {})
        
        print(f"  • {metadata.get('name', 'Unknown')} (ID: {check_id[:8] if check_id else 'N/A'}...)")
        print(f"    Schedule: {attrs.get('schedule')}")
        print(f"    Template: {metadata.get('template_id')}")
        print()
else:
    print(f"Error: {response.status_code}")

Registered health checks (2):

  • google-ping-check (ID: 5404e7bb...)
    Schedule: */5 * * * *
    Template: simple_ping

  • google-ping-check (ID: ce012199...)
    Schedule: */1 * * * *
    Template: simple_ping



## Get Details of a Specific Health Check

You can retrieve full details of a specific health check using its ID.

In [12]:
checks_url = f"{resource_health_domain}/api/healthchecks/v1/checks/"
auth_headers = get_auth_headers()

response = requests.get(checks_url, headers=auth_headers, verify=verify_tls)

if response.status_code == 200:
    data = response.json()
    checks = data.get('data', [])
    
    if checks:
        check_id = checks[0].get('id')
        
        # Fetch detailed info
        detail_url = f"{resource_health_domain}/api/healthchecks/v1/checks/{check_id}"
        detail_response = requests.get(detail_url, headers=auth_headers, verify=verify_tls)
        
        if detail_response.status_code == 200:
            print(f"Details for check {check_id}:")
            print(json.dumps(detail_response.json(), indent=2))
        else:
            print(f"Error fetching details: {detail_response.status_code}")
    else:
        print("No checks available")
else:
    print(f"Error: {response.status_code}")

Details for check 5404e7bb-0aca-4bac-903f-6e10afa8334b:
{
  "data": {
    "id": "5404e7bb-0aca-4bac-903f-6e10afa8334b",
    "type": "check",
    "attributes": {
      "metadata": {
        "name": "google-ping-check",
        "description": "Check if Google is reachable",
        "template_id": "simple_ping",
        "template_args": {
          "endpoint": "https://www.google.com",
          "expected_status_code": 200
        }
      },
      "schedule": "*/5 * * * *",
      "outcome_filter": {
        "resource_attributes": {
          "k8s.cronjob.name": [
            "5404e7bb-0aca-4bac-903f-6e10afa8334b"
          ]
        }
      }
    },
    "links": {
      "self": "https://resource-health.develop.eoepca.org/api/healthchecks/v1/checks/5404e7bb-0aca-4bac-903f-6e10afa8334b",
      "check_template": "https://resource-health.develop.eoepca.org/api/healthchecks/v1/check_templates/simple_ping"
    }
  },
  "links": {
    "self": "https://resource-health.develop.eoepca.org/api/healt

## Run Health Check Manually

In [13]:
run_url = f"{resource_health_domain}/api/healthchecks/v1/checks/{check_id}/run/"
auth_headers = get_auth_headers()
response = requests.post(run_url, headers=auth_headers, verify=verify_tls)

print(response.status_code)
if response.status_code in [200, 204]:

    print(f"✅ Health check run started successfully!")

204
✅ Health check run started successfully!


## View Health Check Results

In [14]:
# Query Telemetry Results
telemetry_url = f"{resource_health_domain}/api/telemetry/v1/spans"
response = requests.get(telemetry_url, headers=get_auth_headers(), verify=verify_tls)

if response.status_code != 200:
    print(f"Error fetching telemetry: {response.status_code}")
else:
    data = response.json().get('data', [])
    if not data:
        print("No telemetry data yet. Run a health check first.")
    else:
        from datetime import datetime
        
        for record in data:
            for rs in record.get('attributes', {}).get('resourceSpans', []):
                # Get resource info
                res_attrs = {a['key']: list(a['value'].values())[0] for a in rs.get('resource', {}).get('attributes', [])}
                print(f"\nHealth Check: {res_attrs.get('health_check.name', 'Unknown')} (User: {res_attrs.get('user.id', 'Unknown')})")
                
                # Get test results
                for scope in rs.get('scopeSpans', []):
                    for span in scope.get('spans', []):
                        attrs = {a['key']: list(a['value'].values())[0] for a in span.get('attributes', [])}
                        if attrs.get('pytest.span_type') == 'test' and 'test.case.result.status' in attrs:
                            result = attrs['test.case.result.status']
                            status = "✅" if result == "pass" else "❌"
                            run_time = datetime.fromtimestamp(int(span['startTimeUnixNano']) / 1e9)
                            duration_ms = (int(span['endTimeUnixNano']) - int(span['startTimeUnixNano'])) / 1e6
                            print(f"  {status} {attrs.get('test.case.name')}: {result} ({run_time:%Y-%m-%d %H:%M:%S}, {duration_ms:.0f}ms)")



Health Check: google-ping-check (User: eoepcauser)
  ✅ test_ping: pass (2026-01-23 11:14:41, 430ms)
  ✅ test_ping: pass (2026-01-23 11:15:04, 441ms)


## Delete a Health Check

To remove a health check, use the DELETE endpoint with the check ID.

In [None]:
checks_url = f"{resource_health_domain}/api/healthchecks/v1/checks/"
auth_headers = get_auth_headers()

response = requests.get(checks_url, headers=auth_headers, verify=verify_tls)

if response.status_code == 200:
    data = response.json()
    checks = data.get('data', [])
    
    target_check = None
    for check in checks:
        metadata = check.get('attributes', {}).get('metadata', {})
        if metadata.get('name') == 'google-ping-check':
            target_check = check
            break
    
    if target_check:
        check_id = target_check.get('id')
        print(f"Deleting health check: {check_id}")
        
        delete_response = requests.delete(
            f"{resource_health_domain}/api/healthchecks/v1/checks/{check_id}",
            headers=auth_headers,
            verify=verify_tls
        )
        
        if delete_response.status_code in [200, 204]:
            print(f"✅ Health check deleted successfully")
        else:
            print(f"❌ Delete failed: {delete_response.status_code}")
            print(delete_response.text)
    else:
        print("jsonplaceholder-api-check not found (may have been deleted already)")
else:
    print(f"Error: {response.status_code}")

jsonplaceholder-api-check not found (may have been deleted already)


In [62]:
# Verify deletion
checks_url = f"{resource_health_domain}/api/healthchecks/v1/checks/"
auth_headers = get_auth_headers()

response = requests.get(checks_url, headers=auth_headers, verify=verify_tls)

if response.status_code == 200:
    data = response.json()
    checks = data.get('data', [])
    
    print(f"Remaining health checks ({len(checks)}):")
    for check in checks:
        metadata = check.get('attributes', {}).get('metadata', {})
        print(f"  • {metadata.get('name')}")
else:
    print(f"Error: {response.status_code}")

Remaining health checks (0):


## Web Dashboard Access

The Resource Health web dashboard provides a graphical interface for:

- Viewing all registered health checks
- Monitoring recent check results
- Creating new health checks via UI
- Visualising success/failure trends

In [24]:
print(f"Web Dashboard URL: {resource_health_domain}")
print(f"\nThe dashboard shows:")
print("  • All registered health checks")
print("  • Recent check results and status")
print("  • Success/failure indicators")
print("  • Option to create new checks")
print(f"\nNote: You will need to authenticate via Keycloak to access the dashboard.")

Web Dashboard URL: https://resource-health.test.eoepca.org

The dashboard shows:
  • All registered health checks
  • Recent check results and status
  • Success/failure indicators
  • Option to create new checks

Note: You will need to authenticate via Keycloak to access the dashboard.


### Further Resources

- [Resource Health GitHub Repository](https://github.com/EOEPCA/resource-health)
- [EOEPCA Deployment Guide](https://deployment-guide.docs.eoepca.org/current/building-blocks/resource-health/)
- [OpenTelemetry Documentation](https://opentelemetry.io/)
- [OpenSearch Documentation](https://opensearch.org/docs/)
- [JSON:API Specification](https://jsonapi.org/)