From c5d54b4ec1238a3c4a97f9de7e5c16191969cddc Mon Sep 17 00:00:00 2001
From: Oliver Meyer <oliverm@aignostics.com>
Date: Thu, 11 Dec 2025 13:08:53 +0100
Subject: [PATCH] docs: document system health checks on run submission

---
 README.md                            | 66 +++++++++++++++++++++++++++-
 docs/partials/README_glossary.md     |  5 ++-
 docs/partials/README_main.md         | 61 +++++++++++++++++++++++++
 src/aignostics/application/CLAUDE.md | 51 +++++++++++++++++++++
 src/aignostics/gui/CLAUDE.md         | 36 +++++++++++++--
 src/aignostics/system/CLAUDE.md      | 54 +++++++++++++++++++++++
 6 files changed, 268 insertions(+), 5 deletions(-)
diff --git a/README.md b/README.md
index cff34a2a2..c61a73da6 100644
--- a/README.md
+++ b/README.md
@@ -327,6 +327,21 @@ QuPath integration provides the most powerful way to visualize and interact with
 
 **Congratulations!** You have successfully downloaded a public dataset, submitted an Atlas H&E-TME analysis run, and learned how to access and inspect your results.
 
+### System Health Checks
+
+The Launchpad automatically monitors system health before allowing run submissions. If the system is unhealthy (e.g., network connectivity issues, authentication problems, or platform unavailability), the submission workflow is blocked:
+
+- A tooltip displays "System is unhealthy, you cannot prepare a run at this time."
+- The "Next" button in the application workflow is disabled.
+- The health status is shown in the footer bar at the bottom of the Launchpad.
+
+To resolve health issues:
+
+1. Check the health status indicator in the footer bar
+2. Click "Info and Settings" in the menu to see detailed health information
+3. Verify your network connection and authentication status
+4. Check the [Aignostics Platform Status](https://status.aignostics.com) page
+
 ### Advanced Setup: Extensions
 
 > 💡 The Launchpad features a growing ecosystem of extensions that seamlessly integrate with standard digital pathology tools. To use the Launchpad with all available extensions, run `uvx --from "aignostics[qupath,marimo]" aignostics launchpad`. Currently available extensions are:
@@ -400,6 +415,28 @@ Check out our
 [CLI reference documentation](https://aignostics.readthedocs.io/en/latest/cli_reference.html)
 to learn about all commands and options available.
 
+### System Health Checks
+
+The CLI automatically checks system health before uploading slides or submitting runs. If the system is unhealthy, the operation is blocked and an error message is displayed:
+
+```
+Error: Platform is not healthy: <reason>. Aborting.
+```
+
+To override this behavior (not recommended for production use), add the `--force` flag:
+
+```shell
+uvx aignostics application run upload he-tme metadata.csv --force
+uvx aignostics application run submit he-tme metadata.csv --force
+uvx aignostics application run execute he-tme metadata.csv data/ --force
+```
+
+To manually check system health before running commands:
+
+```shell
+uvx aignostics system health
+```
+
 ## Python Library: Call the Aignostics Platform API from your Python scripts
 
 The Python SDK includes the *Aignostics Python Library* for integration with your Python codebase.
@@ -465,6 +502,30 @@ and read the
 [client reference documentation](https://aignostics.readthedocs.io/en/latest/lib_reference.html)
 to learn about all classes and methods.
 
+### System Health Checks
+
+The low-level Python SDK does **not** perform automated health checks before operations. If health verification is required for your use case, you should implement checks in your application logic:
+
+```python
+from aignostics import platform
+from aignostics.system import Service as SystemService
+
+# Check system health before submitting runs
+health = SystemService().health()
+if not health:
+    raise RuntimeError(f"System is unhealthy: {health.reason}")
+
+# Proceed with run submission
+client = platform.Client()
+run = client.runs.submit(...)
+```
+
+This design gives you full control over health check behavior, allowing you to:
+
+- Implement custom retry logic for transient failures
+- Log health status for monitoring and debugging
+- Gracefully handle unhealthy states in your application
+
 ### Example Notebooks: Interact with the Aignostics Platform from your Python Notebook environment
 
 > [!IMPORTANT]
@@ -856,9 +917,12 @@ Architectural style for web services that the Aignostics Platform API follows, e
 **Self-signed URLs**  
 Secure URLs with embedded authentication that allow the platform to access user data without exposing credentials.
 
-**SVS**  
+**SVS**
 Aperio ScanScope Virtual Slide format, commonly used for whole slide images and supported by the platform.
 
+**System Health Check**
+Automated verification that the SDK and Aignostics Platform are operational before critical operations. The Launchpad blocks run submission when unhealthy (no override available for regular users). The CLI blocks uploads and submissions by default but allows override with `--force`. The Python Library does not perform automatic health checks, giving developers full control over health verification logic.
+
 ### T
 
 **Test Application**  
diff --git a/docs/partials/README_glossary.md b/docs/partials/README_glossary.md
index ae3905278..619a578c5 100644
--- a/docs/partials/README_glossary.md
+++ b/docs/partials/README_glossary.md
@@ -164,9 +164,12 @@ Architectural style for web services that the Aignostics Platform API follows, e
 **Self-signed URLs**  
 Secure URLs with embedded authentication that allow the platform to access user data without exposing credentials.
 
-**SVS**  
+**SVS**
 Aperio ScanScope Virtual Slide format, commonly used for whole slide images and supported by the platform.
 
+**System Health Check**
+Automated verification that the SDK and Aignostics Platform are operational before critical operations. The Launchpad blocks run submission when unhealthy (no override available for regular users). The CLI blocks uploads and submissions by default but allows override with `--force`. The Python Library does not perform automatic health checks, giving developers full control over health verification logic.
+
 ### T
 
 **Test Application**  
diff --git a/docs/partials/README_main.md b/docs/partials/README_main.md
index 7acda4120..477da9b80 100644
--- a/docs/partials/README_main.md
+++ b/docs/partials/README_main.md
@@ -305,6 +305,21 @@ QuPath integration provides the most powerful way to visualize and interact with
 
 **Congratulations!** You have successfully downloaded a public dataset, submitted an Atlas H&E-TME analysis run, and learned how to access and inspect your results.
 
+### System Health Checks
+
+The Launchpad automatically monitors system health before allowing run submissions. If the system is unhealthy (e.g., network connectivity issues, authentication problems, or platform unavailability), the submission workflow is blocked:
+
+- A tooltip displays "System is unhealthy, you cannot prepare a run at this time."
+- The "Next" button in the application workflow is disabled.
+- The health status is shown in the footer bar at the bottom of the Launchpad.
+
+To resolve health issues:
+
+1. Check the health status indicator in the footer bar
+2. Click "Info and Settings" in the menu to see detailed health information
+3. Verify your network connection and authentication status
+4. Check the [Aignostics Platform Status](https://status.aignostics.com) page
+
 ### Advanced Setup: Extensions
 
 > 💡 The Launchpad features a growing ecosystem of extensions that seamlessly integrate with standard digital pathology tools. To use the Launchpad with all available extensions, run `uvx --from "aignostics[qupath,marimo]" aignostics launchpad`. Currently available extensions are:
@@ -378,6 +393,28 @@ Check out our
 [CLI reference documentation](https://aignostics.readthedocs.io/en/latest/cli_reference.html)
 to learn about all commands and options available.
 
+### System Health Checks
+
+The CLI automatically checks system health before uploading slides or submitting runs. If the system is unhealthy, the operation is blocked and an error message is displayed:
+
+```
+Error: Platform is not healthy: <reason>. Aborting.
+```
+
+To override this behavior (not recommended for production use), add the `--force` flag:
+
+```shell
+uvx aignostics application run upload he-tme metadata.csv --force
+uvx aignostics application run submit he-tme metadata.csv --force
+uvx aignostics application run execute he-tme metadata.csv data/ --force
+```
+
+To manually check system health before running commands:
+
+```shell
+uvx aignostics system health
+```
+
 ## Python Library: Call the Aignostics Platform API from your Python scripts
 
 The Python SDK includes the *Aignostics Python Library* for integration with your Python codebase.
@@ -443,6 +480,30 @@ and read the
 [client reference documentation](https://aignostics.readthedocs.io/en/latest/lib_reference.html)
 to learn about all classes and methods.
 
+### System Health Checks
+
+The low-level Python SDK does **not** perform automated health checks before operations. If health verification is required for your use case, you should implement checks in your application logic:
+
+```python
+from aignostics import platform
+from aignostics.system import Service as SystemService
+
+# Check system health before submitting runs
+health = SystemService().health()
+if not health:
+    raise RuntimeError(f"System is unhealthy: {health.reason}")
+
+# Proceed with run submission
+client = platform.Client()
+run = client.runs.submit(...)
+```
+
+This design gives you full control over health check behavior, allowing you to:
+
+- Implement custom retry logic for transient failures
+- Log health status for monitoring and debugging
+- Gracefully handle unhealthy states in your application
+
 ### Example Notebooks: Interact with the Aignostics Platform from your Python Notebook environment
 
 > [!IMPORTANT]
diff --git a/src/aignostics/application/CLAUDE.md b/src/aignostics/application/CLAUDE.md
index b57443932..d14011e39 100644
--- a/src/aignostics/application/CLAUDE.md
+++ b/src/aignostics/application/CLAUDE.md
@@ -48,6 +48,57 @@ Core application operations:
 
 ## Architecture & Design Patterns
 
+### Health Check Gates
+
+The application module enforces system health checks before critical operations to prevent users from uploading data or submitting runs when the platform is unavailable.
+
+**CLI Health Check Enforcement (`_cli.py`):**
+
+The `_abort_if_system_unhealthy()` function is called before upload and submit operations:
+
+```python
+def _abort_if_system_unhealthy() -> None:
+    """Check system health and abort if unhealthy."""
+    health = SystemService.health_static()
+    if not health:
+        logger.error(f"Platform is not healthy: {health.reason}. Aborting.")
+        console.print(f"[error]Error:[/error] Platform is not healthy: {health.reason}. Aborting.")
+        sys.exit(1)
+```
+
+**Commands with Health Check Gates:**
+
+| Command | Health Check | Override |
+|---------|--------------|----------|
+| `run execute` | Yes | `--force` |
+| `run upload` | Yes | `--force` |
+| `run submit` | Yes | `--force` |
+| `run prepare` | No | N/A |
+| `run list` | No | N/A |
+| `run describe` | No | N/A |
+| `run result download` | No | N/A |
+
+**GUI Health Check Enforcement (`_gui/_page_application_describe.py`):**
+
+The stepper workflow checks health at the application version selection step:
+
+```python
+# Check system health before allowing progression
+system_healthy = bool(SystemService.health_static())
+
+if not system_healthy:
+    version_next_button.disable()
+    ui.tooltip("System is unhealthy, you cannot prepare a run at this time.")
+
+    # Internal users (Aignostics, pre-alpha-org, LMU, Charite) can force-skip
+    if is_internal_user:
+        ui.checkbox("Force (skip health check)", on_change=on_force_change)
+```
+
+**Force Option:**
+
+The `submit_form.force` attribute tracks whether the user has opted to skip health checks. This is only available to internal organization users.
+
 ### Module Structure (NEW in v1.0.0-beta.7)
 
 The application module is organized into focused submodules:
diff --git a/src/aignostics/gui/CLAUDE.md b/src/aignostics/gui/CLAUDE.md
index e37ac952c..c8896c614 100644
--- a/src/aignostics/gui/CLAUDE.md
+++ b/src/aignostics/gui/CLAUDE.md
@@ -35,9 +35,39 @@ The gui module provides common GUI framework components and theming for the Aign
 
 **Health Monitoring:**
 
-- `HEALTH_UPDATE_INTERVAL` - Configurable health check frequency
-- Real-time service status display in UI
-- Centralized health aggregation and reporting
+- `HEALTH_UPDATE_INTERVAL` - Configurable health check frequency (default: 30 seconds)
+- `USERINFO_UPDATE_INTERVAL` - User info refresh interval (default: 60 minutes)
+- Real-time service status display in UI footer
+- Centralized health aggregation and reporting via `SystemService.health_static()`
+
+**Health Check Enforcement:**
+
+The GUI enforces health checks before allowing critical operations:
+
+- **Footer Health Indicator**: Shows "Launchpad is healthy" (green) or "Launchpad is unhealthy" (red)
+- **Application Run Submission**: The "Next" button in the application workflow stepper is disabled when unhealthy
+- **Tooltip Feedback**: Users see "System is unhealthy, you cannot prepare a run at this time."
+- **Force Override**: Internal users (Aignostics, pre-alpha-org, LMU, Charite organizations) can enable a "Force (skip health check)" checkbox
+
+**Health State Management (`_frame.py`):**
+
+```python
+launchpad_healthy: bool | None = None  # None = loading, True = healthy, False = unhealthy
+
+async def _health_load_and_render() -> None:
+    nonlocal launchpad_healthy
+    with contextlib.suppress(Exception):
+        launchpad_healthy = bool(await run.cpu_bound(SystemService.health_static))
+    health_icon.refresh()
+    health_link.refresh()
+
+ui.timer(interval=HEALTH_UPDATE_INTERVAL, callback=_update_health, immediate=True)
+```
+
+**Health Display Components:**
+
+- `health_icon()` - Settings menu icon (green check or red error)
+- `health_link()` - Footer link with status text and icon
 
 **Error Handling:**
 
diff --git a/src/aignostics/system/CLAUDE.md b/src/aignostics/system/CLAUDE.md
index a4ed5e9b5..38adafa24 100644
--- a/src/aignostics/system/CLAUDE.md
+++ b/src/aignostics/system/CLAUDE.md
@@ -41,6 +41,60 @@ Core system operations and diagnostics:
 
 ## Architecture & Design Patterns
 
+### Health Check Enforcement
+
+The system module's health checks are used by other modules to gate critical operations. This ensures users don't submit runs or upload data when the platform is unavailable.
+
+**Enforcement by Interface:**
+
+| Interface | Behavior When Unhealthy | Override Mechanism |
+|-----------|------------------------|-------------------|
+| **Launchpad (GUI)** | Submit button disabled, tooltip explains issue | Internal users only: "Force" checkbox |
+| **CLI** | Operation aborted with error message (exit code 1) | `--force` flag on upload/submit commands |
+| **Python Library** | No automatic enforcement | User implements own checks |
+
+**GUI Enforcement (in `application/_gui/_page_application_describe.py`):**
+
+```python
+# Check system health and determine if force option should be available
+system_healthy = bool(SystemService.health_static())
+
+# Disable the "Next" button if unhealthy
+if not system_healthy:
+    version_next_button.disable()
+    ui.tooltip("System is unhealthy, you cannot prepare a run at this time.")
+
+    # Internal users can force-skip health checks
+    if is_internal_user:
+        ui.checkbox("Force (skip health check)", on_change=on_force_change)
+```
+
+**CLI Enforcement (in `application/_cli.py`):**
+
+```python
+def _abort_if_system_unhealthy() -> None:
+    health = SystemService.health_static()
+    if not health:
+        logger.error(f"Platform is not healthy: {health.reason}. Aborting.")
+        console.print(f"[error]Error:[/error] Platform is not healthy: {health.reason}. Aborting.")
+        sys.exit(1)
+
+# Called before upload and submit operations unless --force is used
+if not force:
+    _abort_if_system_unhealthy()
+```
+
+**Python Library Usage:**
+
+```python
+from aignostics.system import Service as SystemService
+
+# Manual health check before operations
+health = SystemService().health()
+if not health:
+    raise RuntimeError(f"System unhealthy: {health.reason}")
+```
+
 ### Health Check Aggregation Pattern
 
 The system module's health check aggregates status from **ALL modules** in the SDK by discovering and querying every service that inherits from `BaseService`: