Skip to content

Fix volume bind#20

Merged
t0mdavid-m merged 79 commits intomainfrom
fix_volume_bind
Apr 27, 2026
Merged

Fix volume bind#20
t0mdavid-m merged 79 commits intomainfrom
fix_volume_bind

Conversation

@t0mdavid-m
Copy link
Copy Markdown
Member

No description provided.

t0mdavid-m and others added 30 commits February 20, 2026 15:24
* Add Matomo Tag Manager as third analytics tracking mode

Adds Matomo Tag Manager support alongside existing Google Analytics and
Piwik Pro integrations. Includes settings.json configuration (url + tag),
build-time script injection via hook-analytics.py, Klaro GDPR consent
banner integration, and runtime consent granting via MTM data layer API.

https://claude.ai/code/session_0165AXHkmRZ6bx23n7Tbyz8h

* Fix Matomo Tag Manager snippet to match official docs

- Accept full container JS URL instead of separate url + tag fields,
  supporting both self-hosted and Matomo Cloud URL patterns
- Match the official snippet: var _mtm alias, _mtm.push shorthand
- Remove redundant type="text/javascript" attribute
- Remove unused "tag" field from settings.json

https://claude.ai/code/session_0165AXHkmRZ6bx23n7Tbyz8h

* Split Matomo config into base url + tag fields

Separate the Matomo setting into `url` (base URL, e.g.
https://cdn.matomo.cloud/openms.matomo.cloud) and `tag` (container ID,
e.g. yDGK8bfY), consistent with how other providers use a tag field.
The script constructs the full path: {url}/container_{tag}.js

https://claude.ai/code/session_0165AXHkmRZ6bx23n7Tbyz8h

* install matomo tag

---------

Co-authored-by: Claude <noreply@anthropic.com>
* Initial plan

* fix: remove duplicate address entry in config.toml

Co-authored-by: t0mdavid-m <57191390+t0mdavid-m@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: t0mdavid-m <57191390+t0mdavid-m@users.noreply.github.com>
…til.SameFileError (#349)

* Initial plan

* Fix integration test failures: restore sys.modules mocks, handle SameFileError, update CI workflow

Co-authored-by: t0mdavid-m <57191390+t0mdavid-m@users.noreply.github.com>

* Remove unnecessary pyopenms mock from test_topp_workflow_parameter.py, simplify test_parameter_presets.py

Co-authored-by: t0mdavid-m <57191390+t0mdavid-m@users.noreply.github.com>

* Fix Windows build: correct site-packages path in cleanup step

Co-authored-by: t0mdavid-m <57191390+t0mdavid-m@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: t0mdavid-m <57191390+t0mdavid-m@users.noreply.github.com>
…(#351)

On Windows, 0.0.0.0 is not a valid connect address — the browser fails
to open http://0.0.0.0:8501. By removing the address entry from the
bundled .streamlit/config.toml, Streamlit defaults to localhost, which
works correctly for local deployments. Docker deployments are unaffected
as they pass --server.address 0.0.0.0 on the command line.

https://claude.ai/code/session_016amsLCZeFogTksmtk1geb5

Co-authored-by: Claude <noreply@anthropic.com>
* Add CLAUDE.md and Claude Code skills for webapp development

Adds project documentation (CLAUDE.md) and 6 skills to help developers
scaffold and extend OpenMS web applications built from this template:
- /create-page: add a new Streamlit page with proper registration
- /create-workflow: scaffold a full TOPP workflow (class + 4 pages)
- /add-python-tool: add a custom Python analysis script with auto-UI
- /add-presets: add parameter presets for workflows
- /configure-deployment: set up Docker and CI/CD for a new app
- /add-visualization: add pyopenms-viz or OpenMS-Insight visualizations

https://claude.ai/code/session_01WYotmLfqRtB8WJXj1Eosiz

* Strengthen MS domain context in CLAUDE.md and skills

Make it clear to Claude that this is THE framework for building mass
spectrometry web applications for proteomics and metabolomics research.
Add domain-specific context about MS data types, TOPP tool pipelines,
and scientific visualization needs.

https://claude.ai/code/session_01WYotmLfqRtB8WJXj1Eosiz

---------

Co-authored-by: Claude <noreply@anthropic.com>
* Add Kubernetes manifests and CI workflows for de.NBI migration

Decompose the monolithic Docker container into Kubernetes workloads:
- Streamlit Deployment with health probes and session affinity
- Redis Deployment + Service for job queue
- RQ Worker Deployment for background workflows
- CronJob for workspace cleanup
- Ingress with WebSocket support and cookie-based sticky sessions
- Shared PVC (ReadWriteMany) for workspace data
- ConfigMap for runtime configuration (replaces build-time settings)
- Kustomize base + template-app overlay for multi-app deployment

Code changes:
- Remove unsafe enableCORS=false and enableXsrfProtection=false from config.toml
- Make workspace path configurable via WORKSPACES_DIR env var in clean-up-workspaces.py

CI/CD:
- Add build-and-push-image.yml to push Docker images to ghcr.io
- Add k8s-manifests-ci.yml for manifest validation and kind integration tests

https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ

* Fix kubeconform validation to skip kustomization.yaml

kustomization.yaml is a Kustomize config file, not a standard K8s resource,
so kubeconform has no schema for it. Exclude it via -ignore-filename-pattern.

https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ

* Add matrix strategy to test both Dockerfiles in integration tests

The integration-test job now uses a matrix with Dockerfile_simple and
Dockerfile. Each matrix entry checks if its Dockerfile exists before
running — all steps are guarded with an `if` condition so they skip
gracefully when a Dockerfile is absent. This allows downstream forks
that only have one Dockerfile to pass CI without errors.

https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ

* Adapt K8s base manifests for de.NBI Cinder CSI storage

- Switch workspace PVC from ReadWriteMany to ReadWriteOnce with
  cinder-csi storage class (required by de.NBI KKP cluster)
- Increase PVC storage to 500Gi
- Add namespace: openms to kustomization.yaml
- Reduce pod resource requests (1Gi/500m) and limits (8Gi/4 CPU)
  so all workspace-mounting pods fit on a single node

https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ

* Add pod affinity rules to co-locate all workspace pods on same node

The workspaces PVC uses ReadWriteOnce (Cinder CSI block storage) which
requires all pods mounting it to run on the same node. Without explicit
affinity rules, the scheduler was failing silently, leaving pods in
Pending state with no events.

Adds a `volume-group: workspaces` label and podAffinity with
requiredDuringSchedulingIgnoredDuringExecution to streamlit deployment,
rq-worker deployment, and cleanup cronjob. This ensures the scheduler
explicitly co-locates all workspace-consuming pods on the same node.

https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ

* Fix CI: wait for ingress-nginx admission webhook before deploying

The controller pod being Ready doesn't guarantee the admission webhook
service is accepting connections. Add a polling loop that waits for the
webhook endpoint to have an IP assigned before applying the Ingress
resource, preventing "connection refused" errors during kustomize apply.

https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ

* Fix CI: add -n openms namespace to integration test steps

The kustomize overlay deploys into the openms namespace, but the
verification steps (Redis wait, Redis ping, deployment checks) were
querying the default namespace, causing "no matching resources found".

https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ

* Fix CI: retry kustomize deploy for webhook readiness

Replace the unreliable endpoint-IP polling with a retry loop on
kubectl apply (up to 5 attempts with backoff). This handles the race
where the ingress-nginx admission webhook has an endpoint IP but isn't
yet accepting TCP connections.

https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ

---------

Co-authored-by: Claude <noreply@anthropic.com>
* Add Kubernetes manifests and CI workflows for de.NBI migration

Decompose the monolithic Docker container into Kubernetes workloads:
- Streamlit Deployment with health probes and session affinity
- Redis Deployment + Service for job queue
- RQ Worker Deployment for background workflows
- CronJob for workspace cleanup
- Ingress with WebSocket support and cookie-based sticky sessions
- Shared PVC (ReadWriteMany) for workspace data
- ConfigMap for runtime configuration (replaces build-time settings)
- Kustomize base + template-app overlay for multi-app deployment

Code changes:
- Remove unsafe enableCORS=false and enableXsrfProtection=false from config.toml
- Make workspace path configurable via WORKSPACES_DIR env var in clean-up-workspaces.py

CI/CD:
- Add build-and-push-image.yml to push Docker images to ghcr.io
- Add k8s-manifests-ci.yml for manifest validation and kind integration tests

https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ

* Fix kubeconform validation to skip kustomization.yaml

kustomization.yaml is a Kustomize config file, not a standard K8s resource,
so kubeconform has no schema for it. Exclude it via -ignore-filename-pattern.

https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ

* Add matrix strategy to test both Dockerfiles in integration tests

The integration-test job now uses a matrix with Dockerfile_simple and
Dockerfile. Each matrix entry checks if its Dockerfile exists before
running — all steps are guarded with an `if` condition so they skip
gracefully when a Dockerfile is absent. This allows downstream forks
that only have one Dockerfile to pass CI without errors.

https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ

* Adapt K8s base manifests for de.NBI Cinder CSI storage

- Switch workspace PVC from ReadWriteMany to ReadWriteOnce with
  cinder-csi storage class (required by de.NBI KKP cluster)
- Increase PVC storage to 500Gi
- Add namespace: openms to kustomization.yaml
- Reduce pod resource requests (1Gi/500m) and limits (8Gi/4 CPU)
  so all workspace-mounting pods fit on a single node

https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ

* Add pod affinity rules to co-locate all workspace pods on same node

The workspaces PVC uses ReadWriteOnce (Cinder CSI block storage) which
requires all pods mounting it to run on the same node. Without explicit
affinity rules, the scheduler was failing silently, leaving pods in
Pending state with no events.

Adds a `volume-group: workspaces` label and podAffinity with
requiredDuringSchedulingIgnoredDuringExecution to streamlit deployment,
rq-worker deployment, and cleanup cronjob. This ensures the scheduler
explicitly co-locates all workspace-consuming pods on the same node.

https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ

* Fix CI: wait for ingress-nginx admission webhook before deploying

The controller pod being Ready doesn't guarantee the admission webhook
service is accepting connections. Add a polling loop that waits for the
webhook endpoint to have an IP assigned before applying the Ingress
resource, preventing "connection refused" errors during kustomize apply.

https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ

* Fix CI: add -n openms namespace to integration test steps

The kustomize overlay deploys into the openms namespace, but the
verification steps (Redis wait, Redis ping, deployment checks) were
querying the default namespace, causing "no matching resources found".

https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ

* Fix CI: retry kustomize deploy for webhook readiness

Replace the unreliable endpoint-IP polling with a retry loop on
kubectl apply (up to 5 attempts with backoff). This handles the race
where the ingress-nginx admission webhook has an endpoint IP but isn't
yet accepting TCP connections.

https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ

* Fix REDIS_URL to use prefixed service name in overlay

Kustomize namePrefix renames the Redis service to template-app-redis,
but the REDIS_URL env var in streamlit and rq-worker deployments still
referenced the unprefixed name "redis", causing the rq-worker to
CrashLoopBackOff with "Name or service not known".

Add JSON patches in the overlay to set the correct prefixed hostname.

https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ

* Add Traefik IngressRoute for direct LB IP access

The cluster uses Traefik, not nginx, so the nginx Ingress annotations
are ignored. Add a Traefik IngressRoute with PathPrefix(/) catch-all
routing and sticky session cookie for Streamlit session affinity.

https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ

* Fix CI: skip Traefik IngressRoute CRD in validation and integration tests

kubeconform doesn't know the Traefik IngressRoute CRD schema, and the
kind cluster in integration tests doesn't have Traefik installed. Skip
the IngressRoute in kubeconform validation and filter it out with yq
before applying to the kind cluster.

https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ

* Fix IngressRoute service name for kustomize namePrefix

Kustomize namePrefix doesn't rewrite service references inside CRDs,
so the IngressRoute was pointing to 'streamlit' instead of
'template-app-streamlit', causing Traefik to return 404.

https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ

* fix: use ConfigMap as settings override instead of full replacement

The ConfigMap was replacing the entire settings.json, losing keys like
"version" and "repository-name" that the app expects (causing KeyError).
Now the ConfigMap only contains deployment-specific overrides, which are
merged into the Docker image's base settings.json at container startup
using jq.

https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ

* fix: add set -euo pipefail to fail fast on settings merge error

Addresses CodeRabbit review: if jq merge fails, the container should
not start with unmerged settings.

https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ

---------

Co-authored-by: Claude <noreply@anthropic.com>
* Add Kubernetes manifests and CI workflows for de.NBI migration

Decompose the monolithic Docker container into Kubernetes workloads:
- Streamlit Deployment with health probes and session affinity
- Redis Deployment + Service for job queue
- RQ Worker Deployment for background workflows
- CronJob for workspace cleanup
- Ingress with WebSocket support and cookie-based sticky sessions
- Shared PVC (ReadWriteMany) for workspace data
- ConfigMap for runtime configuration (replaces build-time settings)
- Kustomize base + template-app overlay for multi-app deployment

Code changes:
- Remove unsafe enableCORS=false and enableXsrfProtection=false from config.toml
- Make workspace path configurable via WORKSPACES_DIR env var in clean-up-workspaces.py

CI/CD:
- Add build-and-push-image.yml to push Docker images to ghcr.io
- Add k8s-manifests-ci.yml for manifest validation and kind integration tests

https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ

* Fix kubeconform validation to skip kustomization.yaml

kustomization.yaml is a Kustomize config file, not a standard K8s resource,
so kubeconform has no schema for it. Exclude it via -ignore-filename-pattern.

https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ

* Add matrix strategy to test both Dockerfiles in integration tests

The integration-test job now uses a matrix with Dockerfile_simple and
Dockerfile. Each matrix entry checks if its Dockerfile exists before
running — all steps are guarded with an `if` condition so they skip
gracefully when a Dockerfile is absent. This allows downstream forks
that only have one Dockerfile to pass CI without errors.

https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ

* Adapt K8s base manifests for de.NBI Cinder CSI storage

- Switch workspace PVC from ReadWriteMany to ReadWriteOnce with
  cinder-csi storage class (required by de.NBI KKP cluster)
- Increase PVC storage to 500Gi
- Add namespace: openms to kustomization.yaml
- Reduce pod resource requests (1Gi/500m) and limits (8Gi/4 CPU)
  so all workspace-mounting pods fit on a single node

https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ

* Add pod affinity rules to co-locate all workspace pods on same node

The workspaces PVC uses ReadWriteOnce (Cinder CSI block storage) which
requires all pods mounting it to run on the same node. Without explicit
affinity rules, the scheduler was failing silently, leaving pods in
Pending state with no events.

Adds a `volume-group: workspaces` label and podAffinity with
requiredDuringSchedulingIgnoredDuringExecution to streamlit deployment,
rq-worker deployment, and cleanup cronjob. This ensures the scheduler
explicitly co-locates all workspace-consuming pods on the same node.

https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ

* Fix CI: wait for ingress-nginx admission webhook before deploying

The controller pod being Ready doesn't guarantee the admission webhook
service is accepting connections. Add a polling loop that waits for the
webhook endpoint to have an IP assigned before applying the Ingress
resource, preventing "connection refused" errors during kustomize apply.

https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ

* Fix CI: add -n openms namespace to integration test steps

The kustomize overlay deploys into the openms namespace, but the
verification steps (Redis wait, Redis ping, deployment checks) were
querying the default namespace, causing "no matching resources found".

https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ

* Fix CI: retry kustomize deploy for webhook readiness

Replace the unreliable endpoint-IP polling with a retry loop on
kubectl apply (up to 5 attempts with backoff). This handles the race
where the ingress-nginx admission webhook has an endpoint IP but isn't
yet accepting TCP connections.

https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ

* Fix REDIS_URL to use prefixed service name in overlay

Kustomize namePrefix renames the Redis service to template-app-redis,
but the REDIS_URL env var in streamlit and rq-worker deployments still
referenced the unprefixed name "redis", causing the rq-worker to
CrashLoopBackOff with "Name or service not known".

Add JSON patches in the overlay to set the correct prefixed hostname.

https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ

* Add Traefik IngressRoute for direct LB IP access

The cluster uses Traefik, not nginx, so the nginx Ingress annotations
are ignored. Add a Traefik IngressRoute with PathPrefix(/) catch-all
routing and sticky session cookie for Streamlit session affinity.

https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ

* Fix CI: skip Traefik IngressRoute CRD in validation and integration tests

kubeconform doesn't know the Traefik IngressRoute CRD schema, and the
kind cluster in integration tests doesn't have Traefik installed. Skip
the IngressRoute in kubeconform validation and filter it out with yq
before applying to the kind cluster.

https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ

* Fix IngressRoute service name for kustomize namePrefix

Kustomize namePrefix doesn't rewrite service references inside CRDs,
so the IngressRoute was pointing to 'streamlit' instead of
'template-app-streamlit', causing Traefik to return 404.

https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ

* fix: use ConfigMap as settings override instead of full replacement

The ConfigMap was replacing the entire settings.json, losing keys like
"version" and "repository-name" that the app expects (causing KeyError).
Now the ConfigMap only contains deployment-specific overrides, which are
merged into the Docker image's base settings.json at container startup
using jq.

https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ

* fix: add set -euo pipefail to fail fast on settings merge error

Addresses CodeRabbit review: if jq merge fails, the container should
not start with unmerged settings.

https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ

* fix: change imagePullPolicy to Always for mutable main tag

With IfNotPresent, rollout restarts reuse the cached image even when a
new version has been pushed with the same tag. Always ensures Kubernetes
pulls the latest image on every pod start.

https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ

* fix: build full Dockerfile instead of Dockerfile_simple

Switch CI to build the full Docker image with OpenMS and TOPP tools,
not the lightweight pyOpenMS-only image.

https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ

---------

Co-authored-by: Claude <noreply@anthropic.com>
* Add Kubernetes manifests and CI workflows for de.NBI migration

Decompose the monolithic Docker container into Kubernetes workloads:
- Streamlit Deployment with health probes and session affinity
- Redis Deployment + Service for job queue
- RQ Worker Deployment for background workflows
- CronJob for workspace cleanup
- Ingress with WebSocket support and cookie-based sticky sessions
- Shared PVC (ReadWriteMany) for workspace data
- ConfigMap for runtime configuration (replaces build-time settings)
- Kustomize base + template-app overlay for multi-app deployment

Code changes:
- Remove unsafe enableCORS=false and enableXsrfProtection=false from config.toml
- Make workspace path configurable via WORKSPACES_DIR env var in clean-up-workspaces.py

CI/CD:
- Add build-and-push-image.yml to push Docker images to ghcr.io
- Add k8s-manifests-ci.yml for manifest validation and kind integration tests

https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ

* Fix kubeconform validation to skip kustomization.yaml

kustomization.yaml is a Kustomize config file, not a standard K8s resource,
so kubeconform has no schema for it. Exclude it via -ignore-filename-pattern.

https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ

* Add matrix strategy to test both Dockerfiles in integration tests

The integration-test job now uses a matrix with Dockerfile_simple and
Dockerfile. Each matrix entry checks if its Dockerfile exists before
running — all steps are guarded with an `if` condition so they skip
gracefully when a Dockerfile is absent. This allows downstream forks
that only have one Dockerfile to pass CI without errors.

https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ

* Adapt K8s base manifests for de.NBI Cinder CSI storage

- Switch workspace PVC from ReadWriteMany to ReadWriteOnce with
  cinder-csi storage class (required by de.NBI KKP cluster)
- Increase PVC storage to 500Gi
- Add namespace: openms to kustomization.yaml
- Reduce pod resource requests (1Gi/500m) and limits (8Gi/4 CPU)
  so all workspace-mounting pods fit on a single node

https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ

* Add pod affinity rules to co-locate all workspace pods on same node

The workspaces PVC uses ReadWriteOnce (Cinder CSI block storage) which
requires all pods mounting it to run on the same node. Without explicit
affinity rules, the scheduler was failing silently, leaving pods in
Pending state with no events.

Adds a `volume-group: workspaces` label and podAffinity with
requiredDuringSchedulingIgnoredDuringExecution to streamlit deployment,
rq-worker deployment, and cleanup cronjob. This ensures the scheduler
explicitly co-locates all workspace-consuming pods on the same node.

https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ

* Fix CI: wait for ingress-nginx admission webhook before deploying

The controller pod being Ready doesn't guarantee the admission webhook
service is accepting connections. Add a polling loop that waits for the
webhook endpoint to have an IP assigned before applying the Ingress
resource, preventing "connection refused" errors during kustomize apply.

https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ

* Fix CI: add -n openms namespace to integration test steps

The kustomize overlay deploys into the openms namespace, but the
verification steps (Redis wait, Redis ping, deployment checks) were
querying the default namespace, causing "no matching resources found".

https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ

* Fix CI: retry kustomize deploy for webhook readiness

Replace the unreliable endpoint-IP polling with a retry loop on
kubectl apply (up to 5 attempts with backoff). This handles the race
where the ingress-nginx admission webhook has an endpoint IP but isn't
yet accepting TCP connections.

https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ

* Fix REDIS_URL to use prefixed service name in overlay

Kustomize namePrefix renames the Redis service to template-app-redis,
but the REDIS_URL env var in streamlit and rq-worker deployments still
referenced the unprefixed name "redis", causing the rq-worker to
CrashLoopBackOff with "Name or service not known".

Add JSON patches in the overlay to set the correct prefixed hostname.

https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ

* Add Traefik IngressRoute for direct LB IP access

The cluster uses Traefik, not nginx, so the nginx Ingress annotations
are ignored. Add a Traefik IngressRoute with PathPrefix(/) catch-all
routing and sticky session cookie for Streamlit session affinity.

https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ

* Fix CI: skip Traefik IngressRoute CRD in validation and integration tests

kubeconform doesn't know the Traefik IngressRoute CRD schema, and the
kind cluster in integration tests doesn't have Traefik installed. Skip
the IngressRoute in kubeconform validation and filter it out with yq
before applying to the kind cluster.

https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ

* Fix IngressRoute service name for kustomize namePrefix

Kustomize namePrefix doesn't rewrite service references inside CRDs,
so the IngressRoute was pointing to 'streamlit' instead of
'template-app-streamlit', causing Traefik to return 404.

https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ

* fix: use ConfigMap as settings override instead of full replacement

The ConfigMap was replacing the entire settings.json, losing keys like
"version" and "repository-name" that the app expects (causing KeyError).
Now the ConfigMap only contains deployment-specific overrides, which are
merged into the Docker image's base settings.json at container startup
using jq.

https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ

* fix: add set -euo pipefail to fail fast on settings merge error

Addresses CodeRabbit review: if jq merge fails, the container should
not start with unmerged settings.

https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ

* fix: change imagePullPolicy to Always for mutable main tag

With IfNotPresent, rollout restarts reuse the cached image even when a
new version has been pushed with the same tag. Always ensures Kubernetes
pulls the latest image on every pod start.

https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ

* fix: build full Dockerfile instead of Dockerfile_simple

Switch CI to build the full Docker image with OpenMS and TOPP tools,
not the lightweight pyOpenMS-only image.

https://claude.ai/code/session_01RNJ3dVjV1VTHcC9ugE3FQJ

* Scope IngressRoute to hostname and drop unused nginx Ingress

Traefik is the only ingress controller on the cluster; the nginx Ingress
in k8s/base/ingress.yaml was orphaned (no nginx class available) and the
overlay was patching it instead of the active Traefik IngressRoute.

- Add Host() match to the base IngressRoute (placeholder filled by overlays)
- template-app overlay patches the IngressRoute with template.webapps.openms.de
- Remove ingress.yaml from the base kustomization resources list (file kept
  in the repo for nginx-based consumers)

https://claude.ai/code/session_01YNDYJTx1eSKaL9vQe1GQzV

* fix: use PVC mount for workspaces in online mode

In online mode, src/common/common.py hard-coded workspaces_dir to the
literal ".." which, from WORKDIR /app, resolved to /. Workspace UUID
directories were therefore created on each pod's ephemeral local
filesystem instead of the shared PVC mounted at
/workspaces-streamlit-template, so the Streamlit pod and the RQ worker
each saw their own disconnected copy. The worker's params.json load in
tasks.py then hit an empty dict, producing `KeyError: 'mzML-files'` as
soon as Workflow.execution() ran.

- common.py: in the online branch, use WORKSPACES_DIR env var (default
  /workspaces-streamlit-template) so Streamlit, the RQ worker, and the
  cleanup cronjob (which already reads WORKSPACES_DIR) all agree on one
  location.
- k8s streamlit & rq-worker deployments: set WORKSPACES_DIR explicitly so
  the env is overridable and visible at deploy time.
- WorkflowManager.start_workflow: call save_parameters() before dispatch
  so the latest session state is flushed to disk, closing a small race
  where a fragment rerun could leave params.json stale when the worker
  picked up the job.

https://claude.ai/code/session_01TsxtENPpuCZ1Ap3mX2ZpHr

---------

Co-authored-by: Claude <noreply@anthropic.com>
* fix(ci): pin OpenMS contrib download to matching release tag

The Windows build step downloaded contrib_build-Windows.tar.gz from
OpenMS/contrib without a --tag, always pulling the latest release.
When the GH Actions cache (7-day eviction) expired, a newer contrib
got pulled that was incompatible with the pinned OpenMS release/3.5.0
source tree, breaking MSVC compilation in DIAPrescoring.cpp.

Pin the download to release/${OPENMS_VERSION} and tie the cache key
to the OpenMS version so contrib stays in lockstep with the source.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(ci): pass release tag as positional arg to gh release download

`gh release download` takes the tag as a positional argument, not a
`--tag` flag. Silently failed to match on Windows with the system error
"The system cannot find the file specified".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* ci: allow contrib version override via OPENMS_CONTRIB_VERSION

Adds OPENMS_CONTRIB_VERSION env var that falls back to OPENMS_VERSION
when empty. Lets us point OPENMS_VERSION at a non-release branch (e.g.
develop) while keeping the Windows contrib download pinned to a known
release tag, so CI doesn't fail on a missing contrib release.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: ignore docs/superpowers/ (local design notes)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Remove stale patches from template-app overlay

The Deployment/streamlit patch with Ingress-shaped path /spec/rules/0/host
never applied and produced a silent no-op. The duplicate IngressRoute
service-name patch was redundant with the first IngressRoute patch block.
This brings the on-disk overlay in line with the production cluster's
running version.

* Rename configure-deployment skill to configure-docker-compose-deployment

First step of splitting the skill into three focused skills
(configure-app-settings, configure-docker-compose-deployment,
configure-k8s-deployment). Rename is in its own commit so
git log --follow traces the docker-compose content cleanly.

* Scope docker-compose skill to docker-compose-only

Removes app-level content (settings.json, Dockerfile choice, production
app examples) that will live in configure-app-settings. Adds a
prerequisite note pointing to configure-app-settings.

* Add configure-app-settings skill

Covers app-level configuration (settings.json, Dockerfile choice,
README, dependencies) shared by every deployment mode. Prerequisite
for configure-docker-compose-deployment and configure-k8s-deployment.

* Fix settings.json key-field list inconsistency

The Key fields prose listed max_threads (not in the JSON sample) and
omitted enable_workspaces (which is in the sample). Align the prose
with the sample and describe max_threads separately since it is a
nested object rather than a flat field.

* Add configure-k8s-deployment skill

New skill walking through Kustomize overlay creation and kubectl apply
for deploying a forked app to Kubernetes. Patch list reflects the
three-patch canonical shape (IngressRoute match + service, streamlit
Redis URL, rq-worker Redis URL).

* Fix inline-code rendering in k8s skill

The Host(`...`) escape syntax produced literal backslashes that
broke the inline-code span when rendered by markdown parsers. Rewrite
as Host(...) without nested backticks so the span renders cleanly.

* Add K8s deployment doc — overview and architecture sections

* Add K8s deployment doc — manifest reference section

* Add K8s deployment doc — fork-and-deploy guide

* Add K8s deployment doc — CI/CD pipeline section

* Clarify PR-blocking behavior depends on branch protection

The workflow does not block merges directly — it produces a check
status that a branch-protection rule can gate on. Make the
preconditions explicit.

* Register Kubernetes Deployment page in Streamlit documentation

* Cross-link docs/deployment.md to Kubernetes deployment page

Adds a preamble listing both deployment paths and introduces a
## Docker Compose heading above the existing content. The existing
docker-compose content is preserved verbatim.

* Add smoke test for Kubernetes Deployment documentation page

Extends the parametrized test_documentation cases to cover the new
Documentation page added by this branch, closing the gap where it
was the only selectbox entry without test coverage.
ci: unified docker workflow (shadow mode)
github.repository preserves the original casing (OpenMS/streamlit-template).
Docker OCI references require lowercase, so cache-from/cache-to fail with
'invalid reference format'. docker/metadata-action handles this internally
for tags, but the cache refs bypass it. Compute IMAGE_NAME_LC once and use
it in both cache refs.
ci: lowercase image name for OCI cache refs
With push: true, docker/build-push-action pushes every tag in its tags
input. A bare name like 'openms-streamlit:simple-test' (no registry
prefix) gets resolved to Docker Hub and fails with 401 unauthorized,
because the workflow's GHCR token has no rights on docker.io.

The local tag was only needed for the kind retag step. Since load: true
already loads the image into the runner's docker daemon, we can create
the stable local alias with a plain 'docker tag' step after build,
picking any tag from docker/metadata-action's output.
ci: don't pass unprefixed local tag to buildx push
ci: cut over from old docker workflows to build-and-test
The @V3 floating tag does not exist on snok/container-retention-policy
(v2 is the latest floating major tag; v3 only has v3.0.0 and v3.0.1
as exact version tags). The workflow fails to resolve the action with
'unable to find version v3'. Pin to v3.0.1 (latest v3 release).
The ENV GH_TOKEN=${GITHUB_TOKEN} at the top baked the per-run token
into an early layer, so every workflow run rebuilt from scratch.
Moved the ARG next to the one RUN that uses it (gh release download)
so earlier layers stay cacheable.
t0mdavid-m and others added 27 commits April 24, 2026 11:29
Mirrors the base example with overlay-specific guidance: `namePrefix`
only rewrites Kustomize-managed resources, so imperative Secrets must
still use the literal name `streamlit-secrets`.
k8s: mount admin password from streamlit-secrets Secret
Factor node placement and memory sizing out of the base manifests into
reusable Kustomize components (memory-tier-low / memory-tier-high), so
each fork picks its tier with a single line in its overlay.

- base: remove per-pod `resources` from streamlit and rq-worker
  Deployments; sizing now comes from the tier component
- base: promote redis to Guaranteed QoS (requests == limits for both
  cpu and memory) so it bottoms the kernel OOM list
- base: add LimitRange so containers without explicit resources inherit
  safe defaults (512Mi/250m request, 2Gi/2 limit, 64Gi/16 max)
- components/memory-tier-low: nodeSelector=low, streamlit 512Mi/2Gi,
  rq-worker 1Gi/16Gi (Burstable)
- components/memory-tier-high: nodeSelector=high, streamlit 512Mi/4Gi,
  rq-worker 2Gi/180Gi (Burstable — uniform across heavy workers so a
  single active app can burst into the shared pool)
- overlays: rename template-app/ to prod/ (one overlay per repo; the
  repo itself identifies the app) and pull in memory-tier-low
- docs & skill: document the new overlays/prod/ path and the one-line
  tier selector; update CI to kustomize the renamed overlay

https://claude.ai/code/session_01LW4iBWt5YftuqFGc3jM5ZP
The memory-tier-low component adds nodeSelector
openms.de/memory-tier=low to every Deployment. kind clusters have no
such label, so after the rename to overlays/prod all pods stayed
Pending and 'Wait for Redis to be ready' timed out.

Label --all kind nodes in both the nginx and Traefik integration jobs
before deploying so the nodeSelector matches.

Also raise the LimitRange max.memory from 64Gi to 200Gi. The original
cap was written before memory-tier-high settled on a 180Gi rq-worker
limit; without the bump, a high-tier fork (e.g. OpenDIAKiosk) would be
rejected by admission when deployed into the shared openms namespace
after the template's LimitRange is applied.

https://claude.ai/code/session_01LW4iBWt5YftuqFGc3jM5ZP
Completes the overlay rename started in 6c61365 now that the branch
has merged main, which added the example file under the old path.

Also rewrite two remaining docs references to overlays/<your-app-name>/
and the CI description to the new prod overlay.

https://claude.ai/code/session_01LW4iBWt5YftuqFGc3jM5ZP
Spin up a 2-node kind cluster (control-plane labeled memory-tier=low
+ ingress-ready, worker labeled memory-tier=high) so the Build-and-Test
job passes regardless of which memory-tier component a fork's overlay
pulls in. Previously we labeled --all nodes with a single tier after
creation, which broke as soon as a fork flipped memory-tier-low to
memory-tier-high.

- .github/kind-config.yaml: 2-node topology with per-node labels.
- .github/workflows/build-and-test.yml: point both helm/kind-action
  invocations (nginx build + traefik-integration) at the config and
  drop the now-redundant dynamic label step.

https://claude.ai/code/session_01LW4iBWt5YftuqFGc3jM5ZP
Previous run (2f28ed9) showed build + traefik-integration jobs still
timing out on 'Wait for Redis'. Root cause: multi-node kind clusters
apply node-role.kubernetes.io/control-plane:NoSchedule to the
control-plane, which untolerated app pods can't land on even though
the nodeSelector matches. The single-node kind used previously had
no such taint, which is why CI worked until we added a second node.

Add a kubeadmConfigPatches stanza setting nodeRegistration.taints to
the empty list so the control-plane is schedulable. Labels and
cluster shape (1 control-plane + 1 worker) stay the same.

https://claude.ai/code/session_01LW4iBWt5YftuqFGc3jM5ZP
…imization-RoNnJ

Refactor K8s deployment to use memory-tier components
Adds a seed-demos initContainer to the Streamlit Deployment that merges
image-shipped demos into /workspaces-streamlit-template/.demos/ with
cp -rn, so new demos in an image appear after redeploy while admin-saved
demos and edits persist across redeploys.

- Point demo_workspaces.source_dirs at the PV path via the ConfigMap
  override (both streamlit and rq-worker pick this up through the jq
  settings merge at startup).
- Make get_demo_target_dir() settings-driven so "Save as Demo" writes
  to the PV, with backwards-compatible fallbacks for the legacy
  source_dir string and for environments without settings (tests).
- Skip hidden top-level dirs in clean-up-workspaces.py so the nightly
  cron does not garbage-collect .demos/.
- Document the .demos/ layout and the re-seed flow.

https://claude.ai/code/session_01Y87aULHSdyBobPdaD4L6tW
…-azhkG

Support configurable demo workspace source directories
The Secret used to be an out-of-band copy-the-example step, so forgetting
the resources-list edit left the pod booting with an empty admin-secrets
mount and a user-facing "Admin not configured" error for a feature that
was never wired up in the first place.

Now the Secret is committed to the base with an empty admin password and
included in k8s/base/kustomization.yaml, so kubectl apply -k always
creates it. The "Save as Demo" expander is gated on a non-empty password
and is hidden entirely (no error box) when not configured. Operators
enable the feature by patching the live Secret or by editing the file
locally with git update-index --skip-worktree, both documented.
Exception handling in is_admin_configured() is tightened to also catch
StreamlitSecretNotFoundError so a missing secrets file never raises.

https://claude.ai/code/session_01V1noocAR7uXWjWsC9oLGhz
Hide Save-as-Demo UI when admin password is not configured
Split the build+test flow into three stages so the traefik ingress
test no longer rebuilds Dockerfile_simple from scratch:

  build (matrix: full, simple)
    -> uploads each image as a workflow artifact
  test-nginx (matrix: full, simple)
    -> downloads artifact, kind loads, tests nginx ingress
  test-traefik (simple only)
    -> downloads simple artifact, kind loads, tests traefik ingress

Artifacts (not GHCR) are used because the build job only pushes on
non-PR events and fork PRs cannot auth to GHCR at all, so registry
sharing would not work for every PR path.
Mirror the build/test-nginx matrix so the traefik ingress test also
covers the full and simple variants instead of just simple.
test-traefik (simple) failed in the combined "Wait for Redis and
deployments to be ready" step because the deployment took longer than
120s to become available, and unlike the test-nginx wait the failure
was not soft. Align test-traefik with test-nginx:

- Split Redis wait (hard, 60s) from deployment wait (soft, `|| true`).
- Bump deployment timeout 120s -> 180s in both jobs.
- Widen the curl warm-up loop from 5x2s to 30x2s in both jobs so a
  marginally late deployment is tolerated; a real failure still
  surfaces via the trailing unconditional curl.
The previous skill was a manual find-and-replace checklist that assumed
Claude could run kubectl against the cluster. Restructure it as an
interview-driven file-editing guide with a clear handoff to a human
operator (or CI) for cluster apply.

- Drop kubectl, kubectl kustomize, and rollout-verification steps that
  Claude can't actually execute.
- Drop nginx ingress fallback; production is Traefik-only.
- Add a Step 1 recon over a fixed set of base/overlay/CI files so
  defaults are derived from the repo, and the skill bails on layouts
  it doesn't recognize.
- Replace the manual checklist with six interview questions, each
  paired with what it controls in the running deployment, the proposed
  default, and the reasoning. Slug, GHCR ref, image tag, ingress
  subdomain, memory tier, workspace storage size.
- Make storage a single 1-line edit to k8s/base/workspace-pvc.yaml when
  the user picks a non-default size; keep the PVC base name unchanged
  (namePrefix scopes it per-fork, no collisions).
- Pin the default storage size to 500 Gi to match the stock base, so
  the default needs zero file edits.
- Explain that images[0].name is a Kustomize match key and must not
  change.
Refactor CI workflow to build images once and reuse across jobs
Refactor k8s deployment skill to interview-driven overlay editing
The shared volume-group: workspaces label and required pod-affinity
attracted every fork's workspace pods onto a single node per memory
tier and deadlocked the first replica of any fork landing on an
otherwise-empty tier (no peer pod for the required affinity to match).

Per-fork RWO PVCs (<slug>-workspaces-pvc) already constrain all of
a fork's workspace-using pods to the node the volume is attached to
via the scheduler's VolumeBinding plugin, so the explicit affinity
adds nothing on top. Removing it scopes co-location naturally to one
fork and lets a fresh tier bootstrap without manual affinity-strip.

NodeSelector continues to pick the memory tier; the RWO mount picks
the specific node within that tier.
The kind integration jobs in build-and-test.yml hardcoded `template-app`
as the slug label and `template.webapps.openms.{de,org}` as the Traefik
hostnames. The configure-k8s-deployment skill rewrites those values when
a fork customizes its overlay, after which `kubectl wait -l app=...`
returns "no matching resources found" and Traefik curl tests hit the
wrong Host header. This broke OpenMS/quantms-web PR #19 on its first
overlay PR (run 24964475081).

Have test-nginx and test-traefik discover SLUG (from `commonLabels.app`)
and TRAEFIK_HOSTS (parsed from the rendered IngressRoute match) right
after deploy, and substitute them into the wait/curl steps. The nginx
hostnames stay hardcoded — they come from `k8s/base/ingress.yaml`, which
the skill never edits and Kustomize doesn't rewrite.

Update the configure-k8s-deployment skill to (a) check during recon that
the workflow uses dynamic discovery, (b) flag forks still on the old
hardcoded shape so the skill applies the patch before editing the
overlay, and (c) note in the handoff that no fork-specific workflow
edits are needed.
Remove pod-affinity rules; rely on RWO PVC for co-location
Make CI integration tests discover app slug and hosts dynamically
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 27, 2026

Warning

Rate limit exceeded

@t0mdavid-m has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 29 minutes and 15 seconds before requesting another review.

To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ef600ab5-1661-47ff-b0d1-66b02deadef7

📥 Commits

Reviewing files that changed from the base of the PR and between 93ab8ec and e83f62e.

📒 Files selected for processing (6)
  • .claude/skills/configure-k8s-deployment.md
  • .github/workflows/build-and-test.yml
  • docs/kubernetes-deployment.md
  • k8s/base/cleanup-cronjob.yaml
  • k8s/base/rq-worker-deployment.yaml
  • k8s/base/streamlit-deployment.yaml
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix_volume_bind

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@t0mdavid-m t0mdavid-m merged commit 915f894 into main Apr 27, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants