feat: prevent submissions when system is unhealthy#336
Conversation
Codecov Report❌ Patch coverage is
❌ Your patch status has failed because the patch coverage (68.62%) is below the target coverage (75.00%). You can increase the patch coverage or adjust the target coverage.
|
2d428ed to
7986cac
Compare
Comprehensive PR Review: feat/health-check-before-submit🎯 SummaryThis PR introduces system health checks before run submission to prevent users from submitting runs when the platform is unhealthy. The implementation follows the SDK's modulith architecture correctly by placing the checks in presentation layers (CLI & GUI) rather than the service layer, maintaining proper separation of concerns. ✅ PASSING: All Critical Checks1. ✅ Test Markers (CRITICAL)All tests are properly marked with required category markers:
Verification: 2. ✅ Linting & Code QualityAll checks pass:
3. ✅ Conventional CommitsCommit message follows convention: 4. ✅ Test CoverageThree new unit tests cover health check scenarios with proper mocking ( 📋 Implementation ReviewArchitecture Compliance ✅EXCELLENT DESIGN CHOICE: Health checks are implemented in presentation layers (CLI & GUI) rather than the service layer:
This maintains the modulith architecture principle of avoiding circular dependencies between CLI Implementation Analysis ✅File: New Components:
Implementation Pattern: def _abort_if_system_unhealthy() -> None:
health = SystemService.health_static()
if not health:
logger.error(f"Platform is not healthy: {health.reason}. Aborting.")
console.print(f"[error]Error:[/error] Platform is not healthy: {health.reason}. Aborting.")
sys.exit(1)Usage Pattern: def run_submit(..., force: ForceOption = False) -> str:
if not force:
_abort_if_system_unhealthy()
# ... rest of submission logic✅ Pros:
GUI Implementation Analysis ✅File: Health Check Logic (lines 303-339): # Check system health
system_healthy = bool(SystemService.health_static())
# Check if user is internal
is_internal_user = (
user_info and user_info.organization
and user_info.organization.name.lower() in {"aignostics", "pre-alpha-org", "lmu", "charite"}
)
# Disable next button if unhealthy
if not system_healthy:
version_next_button.disable()
unhealthy_tooltip = ui.tooltip("System is unhealthy, you cannot prepare a run at this time.")
# Show force checkbox for internal users only
if is_internal_user:
ui.checkbox("Force (skip health check)", on_change=on_force_change).mark("CHECKBOX_FORCE")✅ Pros:
This is acceptable for this PR—checking at every step would add complexity. This can be revisited if needed. CLI vs GUI Divergence (Acknowledged in PR)
This divergence is reasonable and documented in the PR description. 🧪 Testing Strategy ✅New Tests Added1.
|
| logger.error(f"Platform is not healthy: {health.reason}. Aborting.") | ||
| console.print(f"[error]Error:[/error] Platform is not healthy: {health.reason}. Aborting.") | ||
| sys.exit(1) | ||
|
|
There was a problem hiding this comment.
✅ Excellent error handling pattern
This function correctly:
- Uses
SystemService.health_static()to check system health - Logs the error with reason for debugging
- Provides user-friendly console output
- Exits with code 1 (appropriate for operational errors)
Optional improvement: Consider adding structured logging with user context:
logger.error(
"Platform health check failed - aborting run submission",
extra={"reason": health.reason, "user": get_current_user()}
)This would help with operational debugging and audit trails.
| # Check if user is internal (can use force option) | ||
| user_info: UserInfo | None = app.storage.tab.get("user_info", None) | ||
| is_internal_user = ( | ||
| user_info | ||
| and user_info.organization | ||
| and user_info.organization.name | ||
| and user_info.organization.name.lower() in {"aignostics", "pre-alpha-org", "lmu", "charite"} | ||
| ) |
There was a problem hiding this comment.
Hardcoded internal organization list
While this works, consider moving this list to a configuration file for easier maintenance:
# In _settings.py or system-level config:
class ApplicationSettings(BaseSettings):
internal_organizations: set[str] = {\"aignostics\", \"pre-alpha-org\", \"lmu\", \"charite\"}
# Then use:
is_internal_user = (
user_info and user_info.organization
and settings.is_internal_organization(user_info.organization.name)
)Benefits:
- Easier to add/remove organizations without code changes
- Can be overridden via environment variables
- Centralized configuration management
- Better testability (can mock settings)
| def test_cli_run_submit_fails_when_system_unhealthy_and_no_force( | ||
| mock_health: MagicMock, runner: CliRunner, tmp_path: Path | ||
| ) -> None: | ||
| """Check run submit command exits with code 1 when system is unhealthy and --force is not used.""" | ||
| mock_health.return_value = Health( | ||
| status=Health.Code.DOWN, | ||
| reason="Simulated unhealthy system for testing", | ||
| ) | ||
| csv_content = "external_id;checksum_base64_crc32c;resolution_mpp;width_px;height_px;staining_method;tissue;disease;" | ||
| csv_content += "platform_bucket_url\n" | ||
| csv_content += ";5onqtA==;0.26268186053789266;7447;7196;H&E;LUNG;LUNG_CANCER;gs://bucket/test" | ||
| csv_path = tmp_path / "dummy.csv" | ||
| csv_path.write_text(csv_content) | ||
|
|
||
| result = runner.invoke( | ||
| cli, | ||
| [ | ||
| "application", | ||
| "run", | ||
| "submit", | ||
| HETA_APPLICATION_ID, | ||
| str(csv_path), | ||
| ], | ||
| ) | ||
|
|
||
| assert result.exit_code == 1 | ||
|
|
There was a problem hiding this comment.
✅ Excellent unit test with proper mocking
This test correctly:
- Uses
@pytest.mark.unitmarker - Mocks
SystemService.health_static()to return unhealthy state - Creates minimal test data (CSV with required fields)
- Asserts exit code 1 for unhealthy system
Suggestion: Consider adding a companion E2E test that verifies --force works end-to-end:
@pytest.mark.e2e
@pytest.mark.timeout(timeout=60)
def test_cli_run_submit_with_force_bypasses_health_check(runner, tmp_path):
\"\"\"Verify --force flag allows submission regardless of health status.\"\"\"
# ... setup csv ...
result = runner.invoke(cli, [
\"application\", \"run\", \"submit\",
HETA_APPLICATION_ID, str(csv_path), \"--force\"
])
# Should not fail due to health check
assert result.exit_code == 0 or \"not found\" in result.stdoutThis would provide confidence that the force flag works in production-like conditions.
| with contextlib.suppress(AssertionError): | ||
| # Click force checkbox if system is unhealthy (checkbox only appears when unhealthy) | ||
| user.find(marker="CHECKBOX_FORCE").click() |
There was a problem hiding this comment.
✅ Smart use of contextlib.suppress
This is the correct approach for handling conditional UI elements:
- The checkbox only appears when system is unhealthy
- Using
suppress(AssertionError)allows the test to pass regardless - Maintains test reliability across different system health states
Alternative consideration: If you want to verify the health check logic explicitly, you could add a separate test:
@pytest.mark.integration
async def test_gui_force_checkbox_appears_when_unhealthy(page, user, monkeypatch):
\"\"\"Verify force checkbox only appears for internal users when system is unhealthy.\"\"\"
# Mock unhealthy system
monkeypatch.setattr(SystemService, \"health_static\", lambda: Health(status=Health.Code.DOWN))
# ... navigate to application version selection ...
# Internal users should see the checkbox
await user.should_see(\"Force (skip health check)\")This would provide explicit test coverage of the health check UI behavior.
|
|
Claude finished @olivermeyer's task —— View job Comprehensive Review: PR #336 - Health Check Before Submit ✅I acknowledge the guidance in CLAUDE.md and have applied the highest standards for this medical device software review. Executive SummaryRECOMMENDATION: ✅ APPROVE with optional improvements This PR successfully implements system health checks before run submission in both CLI and GUI presentation layers. The implementation correctly follows the SDK's modulith architecture principles by placing health checks in the presentation layer rather than the service layer, avoiding inappropriate dependencies. Key Achievements:
Areas for Improvement:
Review Checklist✅ 1. Test Markers (CRITICAL)Status: PASSED ✓ All tests have proper category markers:
✓ Verified: uv run pytest -m "not unit and not integration and not e2e" --collect-only
Result: 0 tests selected (all tests properly marked)
|




This PR aims to prevent users from submitting runs while the system is unhealthy. The checks happen in the presentation layer (GUI and CLI), to avoid making the run submission code aware of system health.
In the GUI, we disable the Next button in the application version selection step and add a tooltip. Internal users see a checkbox allowing them to override this.

A limitation to this implementation is that if the system becomes unhealthy after a user has started the run submission flow, we don't prevent them from submitting the run. We can revisit this at a later stage if required.
In the CLI, we check the system health in the

application run uploadandapplication run submit(which apply toapplication run executetransitively), and exit with an error message if the system is unhealthy. The system health check can be skipped in all three commands with the--forceflag.This introduces a divergence between the CLI and GUI: in the GUI, only internal users can force runs through, while in the CLI this is available to all users.
This also introduces a larger divergence between the CLI and GUI on one hand, and the lower-level SDK on the other (e.g.
platform.Client().runs.submit(), as we don't check the system health in the latter. This avoids introducing a dependency from a low-level component (platform.resources.runs.Runs) to a high-level component (the System service).