Refactor K8s deployment to use memory-tier components by t0mdavid-m · Pull Request #375 · OpenMS/streamlit-template

t0mdavid-m · 2026-04-24T09:40:58Z

Summary

Refactored Kubernetes deployment configuration to replace per-app overlay copies with a single production overlay that selects memory tiers via Kustomize components. This simplifies deployment setup and makes resource allocation explicit and reusable.

Key Changes

Eliminated per-app overlay pattern: Replaced k8s/overlays/template-app/ template with a single k8s/overlays/prod/ overlay that all forks use directly. The forked repository itself identifies the app, removing the need to copy overlay directories.
Introduced memory-tier components: Created two new Kustomize components (memory-tier-low and memory-tier-high) under k8s/components/ that encapsulate:
- Node selector patches (openms.de/memory-tier=low|high)
- Resource requests/limits tuned for each tier (low: 16 GB worker limit; high: 180 GB worker limit)
- Separate patches for streamlit and rq-worker deployments
Removed hardcoded resources from base manifests: Stripped resource requests/limits from k8s/base/streamlit-deployment.yaml and k8s/base/rq-worker-deployment.yaml to allow tier components to inject tier-appropriate values.
Added cluster-wide LimitRange: New k8s/base/limitrange.yaml sets sensible container defaults (512Mi request, 2Gi default limit) and cluster maximums (64Gi memory, 16 CPU) to prevent resource exhaustion.
Updated Redis resources: Increased Redis requests (64Mi → 256Mi memory, 50m → 100m CPU) and aligned limits to match requests for stability.
Updated documentation and CI: Modified skill guide, deployment docs, and GitHub Actions workflows to reference k8s/overlays/prod/ instead of per-app overlays and added memory-tier selection as a deployment step.

Implementation Details

Memory-tier components use JSON Patch operations to add nodeSelector to all Deployments, then override resource patches for specific workloads (streamlit, rq-worker).
The prod overlay includes memory-tier-low by default; users switch to memory-tier-high only for genuinely memory-intensive workloads (DIA spectral-library, OpenSwath, DIA-LFQ).
Cluster nodes must be pre-labelled with openms.de/memory-tier=low|high for the node selectors to function.
All CI workflows updated to validate and deploy using k8s/overlays/prod/ instead of the template overlay.

https://claude.ai/code/session_01LW4iBWt5YftuqFGc3jM5ZP

Summary by CodeRabbit

Release Notes

New Features
- Added memory tier configuration options (low/high) for deployment resource allocation and pod scheduling.
Documentation
- Updated deployment guide to reflect streamlined configuration structure.
Chores
- Simplified shared production deployment configuration; updated CI/CD validation workflow accordingly.
- Applied default container resource limits and adjusted resource allocations for system components to improve resource management.

Factor node placement and memory sizing out of the base manifests into reusable Kustomize components (memory-tier-low / memory-tier-high), so each fork picks its tier with a single line in its overlay. - base: remove per-pod `resources` from streamlit and rq-worker Deployments; sizing now comes from the tier component - base: promote redis to Guaranteed QoS (requests == limits for both cpu and memory) so it bottoms the kernel OOM list - base: add LimitRange so containers without explicit resources inherit safe defaults (512Mi/250m request, 2Gi/2 limit, 64Gi/16 max) - components/memory-tier-low: nodeSelector=low, streamlit 512Mi/2Gi, rq-worker 1Gi/16Gi (Burstable) - components/memory-tier-high: nodeSelector=high, streamlit 512Mi/4Gi, rq-worker 2Gi/180Gi (Burstable — uniform across heavy workers so a single active app can burst into the shared pool) - overlays: rename template-app/ to prod/ (one overlay per repo; the repo itself identifies the app) and pull in memory-tier-low - docs & skill: document the new overlays/prod/ path and the one-line tier selector; update CI to kustomize the renamed overlay https://claude.ai/code/session_01LW4iBWt5YftuqFGc3jM5ZP

coderabbitai · 2026-04-24T09:41:05Z

Warning

Rate limit exceeded

@t0mdavid-m has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 39 minutes and 43 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 39 minutes and 43 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 5ae586da-adb4-42a9-8e5f-456f92638417

📥 Commits

Reviewing files that changed from the base of the PR and between 0bd2ccf and 43c300b.

📒 Files selected for processing (2)

.github/kind-config.yaml
.github/workflows/build-and-test.yml

📝 Walkthrough

Walkthrough

Switches from per-app to shared production overlay architecture, adds configurable memory-tier components for resource management, introduces cluster-wide LimitRange defaults, adjusts container resource specifications, and updates deployment documentation and CI workflows accordingly.

Changes

Cohort / File(s)	Summary
Documentation Updates `.claude/skills/configure-k8s-deployment.md`, `docs/kubernetes-deployment.md`	Updated to reference shared `k8s/overlays/prod/` instead of per-app overlays, adds memory-tier selection step, removes template-overlay expectations, and documents new LimitRange and memory-tier components.
CI Workflow `.github/workflows/build-and-test.yml`	Modified to target `k8s/overlays/prod/` for manifest validation and deployment, adds node labeling step with `openms.de/memory-tier=low` before Kustomize apply in both kind-based jobs.
Base Kubernetes Resources `k8s/base/kustomization.yaml`, `k8s/base/limitrange.yaml`, `k8s/base/redis.yaml`, `k8s/base/rq-worker-deployment.yaml`, `k8s/base/streamlit-deployment.yaml`	Added LimitRange resource for container defaults/limits/max values; adjusted redis requests and limits; removed explicit resource constraints from rq-worker and streamlit deployments.
Memory-Tier Low Component `k8s/components/memory-tier-low/kustomization.yaml`, `k8s/components/memory-tier-low/nodeselector.yaml`, `k8s/components/memory-tier-low/streamlit-resources.yaml`, `k8s/components/memory-tier-low/worker-resources.yaml`	New Kustomize component that applies nodeSelector patch for `openms.de/memory-tier: low` and resource specifications (streamlit: 512Mi/500m requests, 2Gi/4 limits; rq-worker: 256Mi/250m requests, 512Mi/500m limits).
Memory-Tier High Component `k8s/components/memory-tier-high/kustomization.yaml`, `k8s/components/memory-tier-high/nodeselector.yaml`, `k8s/components/memory-tier-high/streamlit-resources.yaml`, `k8s/components/memory-tier-high/worker-resources.yaml`	New Kustomize component that applies nodeSelector patch for `openms.de/memory-tier: high` and resource specifications (streamlit and rq-worker with higher limits and requests for demanding workloads).
Production Overlay `k8s/overlays/prod/kustomization.yaml`	Updated to include `memory-tier-low` component by default, enabling selective resource tier management through component switching.

Possibly related PRs

Add Kubernetes manifests and CI/CD workflows for deployment #347: Modifies the same Kubernetes manifests and kustomize overlays that this PR reorganizes from template-app to prod structure.
Add Kubernetes deployment docs and refactor Claude skills #362: Relates to Kubernetes deployment documentation and kustomize overlay changes, touching the same deployment/docs/overlay areas.
Claude/kubernetes migration plan kq jw d #359: Modifies overlapping Kubernetes manifests and CI workflows for manifest validation and deployment.

Poem

🐰 From templates scattered, one overlay shines,
Memory tiers dancing in configurable lines,
LimitRange wisdom keeps chaos at bay,
Resources now flexible—low tier or high—hooray! 🎉

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Refactor K8s deployment to use memory-tier components' directly and clearly summarizes the main change: introducing Kustomize memory-tier components to the Kubernetes deployment architecture.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch claude/parallel-webapp-memory-optimization-RoNnJ

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

…oNnJ

The memory-tier-low component adds nodeSelector openms.de/memory-tier=low to every Deployment. kind clusters have no such label, so after the rename to overlays/prod all pods stayed Pending and 'Wait for Redis to be ready' timed out. Label --all kind nodes in both the nginx and Traefik integration jobs before deploying so the nodeSelector matches. Also raise the LimitRange max.memory from 64Gi to 200Gi. The original cap was written before memory-tier-high settled on a 180Gi rq-worker limit; without the bump, a high-tier fork (e.g. OpenDIAKiosk) would be rejected by admission when deployed into the shared openms namespace after the template's LimitRange is applied. https://claude.ai/code/session_01LW4iBWt5YftuqFGc3jM5ZP

…tp://127.0.0.1:34609/git/OpenMS/streamlit-template into claude/parallel-webapp-memory-optimization-RoNnJ

Completes the overlay rename started in 6c61365 now that the branch has merged main, which added the example file under the old path. Also rewrite two remaining docs references to overlays/<your-app-name>/ and the CI description to the new prod overlay. https://claude.ai/code/session_01LW4iBWt5YftuqFGc3jM5ZP

Spin up a 2-node kind cluster (control-plane labeled memory-tier=low + ingress-ready, worker labeled memory-tier=high) so the Build-and-Test job passes regardless of which memory-tier component a fork's overlay pulls in. Previously we labeled --all nodes with a single tier after creation, which broke as soon as a fork flipped memory-tier-low to memory-tier-high. - .github/kind-config.yaml: 2-node topology with per-node labels. - .github/workflows/build-and-test.yml: point both helm/kind-action invocations (nginx build + traefik-integration) at the config and drop the now-redundant dynamic label step. https://claude.ai/code/session_01LW4iBWt5YftuqFGc3jM5ZP

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (4)

k8s/base/limitrange.yaml (1)
1-16: LGTM — sane guardrails; two small operational notes.

max.memory: 200Gi correctly leaves headroom for the 180 GiB high-tier rq-worker limit.

Be aware that the default limit (memory: 2Gi, cpu: 2) will be silently applied to any future container that lacks explicit limits — init containers, sidecars, new services. For something like a DB migration init-container or log shipper that legitimately needs more, this can produce surprising OOMKills. Not a blocker for this PR; worth a short comment in the file or a line in docs/kubernetes-deployment.md so fork maintainers know to set explicit limits for any new workload.

Consider setting a maxLimitRequestRatio in a follow-up to prevent accidental over-commit (e.g., a request of 512Mi with a 200Gi limit passing validation).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@k8s/base/limitrange.yaml` around lines 1 - 16, Add an explanatory comment to
the LimitRange resource (metadata.name: default-container-limits, kind:
LimitRange) noting that the provided default (spec.limits[*].default memory:
"2Gi", cpu: "2") will be silently applied to any container without explicit
limits (including init containers and sidecars) and can cause OOMKills for
workloads that legitimately need more; also add a short note in
docs/kubernetes-deployment.md advising maintainers to set explicit limits for
special-case containers and to validate new workloads, and consider a follow-up
change to set spec.limits[*].maxLimitRequestRatio to prevent accidental
over-commit.
k8s/components/memory-tier-low/nodeselector.yaml (1)
1-4: RFC 6902 add semantics: patch will replace existing nodeSelector.

The add operation at /spec/template/spec/nodeSelector follows RFC 6902 semantics, which performs a replace rather than merge on existing paths. If a base Deployment ever includes a nodeSelector, this patch will silently overwrite it. Base manifests currently don't define nodeSelector, so this is safe today — flagging as a gotcha for fork maintainers. For stronger robustness, consider a strategic-merge patch so selectors merge instead of replace.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@k8s/components/memory-tier-low/nodeselector.yaml` around lines 1 - 4, The
patch uses an RFC6902 "add" at /spec/template/spec/nodeSelector which will
replace any existing nodeSelector instead of merging; change this to a
strategic-merge style patch so the nodeSelector map merges with any existing
selectors rather than overwriting them. Replace the JSON-patch add with a
strategic-merge patch (or a kustomize patchStrategicMerge) that sets
nodeSelector: { "openms.de/memory-tier": "low" } under spec.template.spec so
existing keys are preserved and only the new key is added/updated.
.github/workflows/build-and-test.yml (1)
117-118: Tier label is hardcoded; forks switching to memory-tier-high will silently break CI.

Both deployment jobs label all kind nodes with openms.de/memory-tier=low. If a fork flips k8s/overlays/prod/kustomization.yaml to memory-tier-high, pods get a nodeSelector: openms.de/memory-tier=high, no node matches, and the kubectl wait steps just time out — the CI failure won't point at the label.

Consider deriving the tier from the overlay (or labelling each node with both tiers in CI, since there's only one node per kind cluster) so the workflow keeps working under either component selection.
♻️ Option: label the single kind node with both tiers so CI is agnostic
-      - name: Label kind node with the tier the overlay expects
-        run: kubectl label nodes --all openms.de/memory-tier=low --overwrite
+      - name: Label kind nodes with both memory tiers (overlay-agnostic)
+        run: |
+          kubectl label nodes --all openms.de/memory-tier=low --overwrite
+          # Also apply high so either component in the prod overlay schedules.
+          # Note: a nodeSelector picks one value; labeling with both is safe
+          # because only the selected component's patch is applied.
+          kubectl label nodes --all openms.de/memory-tier- --overwrite 2>/dev/null || true
Note: a node can only carry one value for openms.de/memory-tier at a time, so a cleaner fix is to parse the component from k8s/overlays/prod/kustomization.yaml and label accordingly, e.g.:
TIER=$(grep -oE 'memory-tier-(low|high)' k8s/overlays/prod/kustomization.yaml | head -n1 | sed 's/memory-tier-//')
kubectl label nodes --all "openms.de/memory-tier=${TIER}" --overwrite
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/build-and-test.yml around lines 117 - 118, The workflow
currently hardcodes the node label via the kubectl label command (kubectl label
nodes --all openms.de/memory-tier=low --overwrite), which breaks when an overlay
uses memory-tier-high; update the job to derive the tier from
k8s/overlays/prod/kustomization.yaml (e.g., grep/parsing to extract
memory-tier-(low|high) into a TIER variable) and then call kubectl label nodes
--all "openms.de/memory-tier=${TIER}" --overwrite so the CI labels match the
selected overlay; alternatively, if you prefer a simpler change, label the
single kind node with the appropriate tier value dynamically rather than leaving
it hardcoded.
k8s/components/memory-tier-high/worker-resources.yaml (1)
10-16: Consider raising the memory request and dropping the CPU limit.

Two design issues worth reconsidering on the high-tier worker:

Memory request (2Gi) vs limit (180Gi) gives a 90× burst range. The scheduler only reserves 2Gi, so another workload could legally land on the same node despite the worker's legitimate need for ~180Gi. Even with nodeSelector: openms.de/memory-tier=high (presumably dedicating the node), set requests closer to steady-state footprint (e.g., 8–32Gi) to prevent overcommit.

CPU limit of 20 cores is likely unnecessary. RQ workers are bursty batch jobs; a hard CPU limit only causes CFS throttling without clear benefit. Dropping the CPU limit (keeping the request for scheduling) is the standard pattern for background workers.

The LimitRange max.memory: 200Gi will admit the 180Gi limit without issue.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@k8s/components/memory-tier-high/worker-resources.yaml` around lines 10 - 16,
The resources block currently sets memory request to "2Gi" and memory limit to
"180Gi" with a cpu limit of "20", which under-reserves memory and unnecessarily
caps CPU; update the resources for the high-tier worker by increasing
requests.memory to a steady-state value in the 8Gi–32Gi range (e.g., 16Gi) so
the scheduler reserves sufficient RAM, keep or adjust requests.cpu as needed
(requests.cpu currently "2"), and remove the limits.cpu entry entirely (drop the
hard CPU limit) so the pod is not CFS-throttled; adjust the resources.requests
and resources.limits keys accordingly and ensure nodeSelector:
openms.de/memory-tier=high remains if you rely on node isolation.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@k8s/base/redis.yaml`:
- Around line 22-28: The Redis container resources currently set both requests
and limits for cpu to "100m" (under the resources block) are likely too low and
can cause CPU throttling; update the cpu request and cpu limit values in the
resources: requests/limits section for the Redis container to "250m" (keeping
requests == limits to preserve Guaranteed QoS) so bursts (AOF/RDB rewrites,
expirations, MULTI/EVAL) are not throttled.

In `@k8s/components/memory-tier-high/streamlit-resources.yaml`:
- Around line 10-16: Add a short explanatory comment in both
streamlit-resources.yaml and worker-resources.yaml near the resources block (the
"resources: requests: memory/cpu limits: memory/cpu" section) stating that the
high memory tier is intentionally asymmetric: streamlit remains lightweight
(only increases from 2Gi→4Gi) and is not intended for large in-memory workloads,
while rq-worker is the component that scales to handle heavy memory/compute
(e.g., 16Gi→180Gi); place the comment directly above the resources block in each
file and mention the rationale so users won't expect streamlit to serve large
in-memory dataframes.

In `@k8s/overlays/prod/kustomization.yaml`:
- Around line 7-9: The nodeselector patch in the memory-tier components is too
broad and will match all Deployments (including the Redis Deployment); update
each component's kustomization.yaml to restrict the nodeselector.yaml patch so
it only targets the app Deployments (e.g., set patchStrategicMerge or patches
with a target block referencing the patch file nodeselector.yaml and add
target.kind: Deployment plus target.name: streamlit|rq-worker or create separate
per-deployment patches with target.name: streamlit and target.name: rq-worker),
ensuring the Redis Deployment (defined in the base as redis.yaml) is not
modified by these components.

---

Nitpick comments:
In @.github/workflows/build-and-test.yml:
- Around line 117-118: The workflow currently hardcodes the node label via the
kubectl label command (kubectl label nodes --all openms.de/memory-tier=low
--overwrite), which breaks when an overlay uses memory-tier-high; update the job
to derive the tier from k8s/overlays/prod/kustomization.yaml (e.g., grep/parsing
to extract memory-tier-(low|high) into a TIER variable) and then call kubectl
label nodes --all "openms.de/memory-tier=${TIER}" --overwrite so the CI labels
match the selected overlay; alternatively, if you prefer a simpler change, label
the single kind node with the appropriate tier value dynamically rather than
leaving it hardcoded.

In `@k8s/base/limitrange.yaml`:
- Around line 1-16: Add an explanatory comment to the LimitRange resource
(metadata.name: default-container-limits, kind: LimitRange) noting that the
provided default (spec.limits[*].default memory: "2Gi", cpu: "2") will be
silently applied to any container without explicit limits (including init
containers and sidecars) and can cause OOMKills for workloads that legitimately
need more; also add a short note in docs/kubernetes-deployment.md advising
maintainers to set explicit limits for special-case containers and to validate
new workloads, and consider a follow-up change to set
spec.limits[*].maxLimitRequestRatio to prevent accidental over-commit.

In `@k8s/components/memory-tier-high/worker-resources.yaml`:
- Around line 10-16: The resources block currently sets memory request to "2Gi"
and memory limit to "180Gi" with a cpu limit of "20", which under-reserves
memory and unnecessarily caps CPU; update the resources for the high-tier worker
by increasing requests.memory to a steady-state value in the 8Gi–32Gi range
(e.g., 16Gi) so the scheduler reserves sufficient RAM, keep or adjust
requests.cpu as needed (requests.cpu currently "2"), and remove the limits.cpu
entry entirely (drop the hard CPU limit) so the pod is not CFS-throttled; adjust
the resources.requests and resources.limits keys accordingly and ensure
nodeSelector: openms.de/memory-tier=high remains if you rely on node isolation.

In `@k8s/components/memory-tier-low/nodeselector.yaml`:
- Around line 1-4: The patch uses an RFC6902 "add" at
/spec/template/spec/nodeSelector which will replace any existing nodeSelector
instead of merging; change this to a strategic-merge style patch so the
nodeSelector map merges with any existing selectors rather than overwriting
them. Replace the JSON-patch add with a strategic-merge patch (or a kustomize
patchStrategicMerge) that sets nodeSelector: { "openms.de/memory-tier": "low" }
under spec.template.spec so existing keys are preserved and only the new key is
added/updated.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 3206d92b-d0b2-41e1-824b-5ab783103152

📥 Commits

Reviewing files that changed from the base of the PR and between 971cfdd and 0bd2ccf.

📒 Files selected for processing (18)

.claude/skills/configure-k8s-deployment.md
.github/workflows/build-and-test.yml
docs/kubernetes-deployment.md
k8s/base/kustomization.yaml
k8s/base/limitrange.yaml
k8s/base/redis.yaml
k8s/base/rq-worker-deployment.yaml
k8s/base/streamlit-deployment.yaml
k8s/components/memory-tier-high/kustomization.yaml
k8s/components/memory-tier-high/nodeselector.yaml
k8s/components/memory-tier-high/streamlit-resources.yaml
k8s/components/memory-tier-high/worker-resources.yaml
k8s/components/memory-tier-low/kustomization.yaml
k8s/components/memory-tier-low/nodeselector.yaml
k8s/components/memory-tier-low/streamlit-resources.yaml
k8s/components/memory-tier-low/worker-resources.yaml
k8s/overlays/prod/kustomization.yaml
k8s/overlays/prod/streamlit-secrets.yaml.example

💤 Files with no reviewable changes (2)

k8s/base/streamlit-deployment.yaml
k8s/base/rq-worker-deployment.yaml

coderabbitai · 2026-04-24T10:03:53Z

          resources:
            requests:
-              memory: "64Mi"
-              cpu: "50m"
+              memory: "256Mi"
+              cpu: "100m"
            limits:
              memory: "256Mi"
-              cpu: "250m"
+              cpu: "100m"


⚠️ Potential issue | 🟡 Minor

Reconsider dropping Redis CPU limit from 250m → 100m.

Setting requests == limits for both CPU and memory correctly promotes Redis to Guaranteed QoS, which is good. However, capping CPU at 100m is quite aggressive: Redis is largely single-threaded but can burst to a full core during AOF/RDB rewrites, expiration sweeps, large MULTI/EVAL executions, or client reconnect storms. Under throttling, these bursts will translate into tail-latency spikes that affect rq-worker enqueue/dequeue throughput across the whole app.

Unless you've observed Redis consistently consuming <100m under load, consider keeping the limit at 250m (and matching the request to preserve Guaranteed QoS):

♻️ Suggested adjustment

resources: requests: memory: "256Mi" - cpu: "100m" + cpu: "250m" limits: memory: "256Mi" - cpu: "100m" + cpu: "250m"

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

resources:

requests:

memory: "64Mi"

cpu: "50m"

memory: "256Mi"

cpu: "100m"

limits:

memory: "256Mi"

cpu: "250m"

cpu: "100m"

resources:

requests:

memory: "256Mi"

cpu: "250m"

limits:

memory: "256Mi"

cpu: "250m"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@k8s/base/redis.yaml` around lines 22 - 28, The Redis container resources currently set both requests and limits for cpu to "100m" (under the resources block) are likely too low and can cause CPU throttling; update the cpu request and cpu limit values in the resources: requests/limits section for the Redis container to "250m" (keeping requests == limits to preserve Guaranteed QoS) so bursts (AOF/RDB rewrites, expirations, MULTI/EVAL) are not throttled.

coderabbitai · 2026-04-24T10:03:53Z

+          resources:
+            requests:
+              memory: "512Mi"
+              cpu: "500m"
+            limits:
+              memory: "4Gi"
+              cpu: "4"


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# Find all streamlit resource files in memory-tier directories find . -type f -name "streamlit-resources.yaml" | head -20

Repository: OpenMS/streamlit-template

Length of output: 186

🏁 Script executed:

# Find rq-worker resource files to compare scaling find . -type f -name "*rq-worker*" -o -name "*rq_worker*" | head -20

Repository: OpenMS/streamlit-template

Length of output: 106

🏁 Script executed:

# Check the directory structure of k8s/components ls -la k8s/components/ 2>/dev/null | head -30

Repository: OpenMS/streamlit-template

Length of output: 307

🏁 Script executed:

# Read the low-tier streamlit resources cat -n k8s/components/memory-tier-low/streamlit-resources.yaml

Repository: OpenMS/streamlit-template

Length of output: 488

🏁 Script executed:

# Read the high-tier streamlit resources (from review) cat -n k8s/components/memory-tier-high/streamlit-resources.yaml

Repository: OpenMS/streamlit-template

Length of output: 488

🏁 Script executed:

# Read the rq-worker deployment to check resource scaling cat -n k8s/base/rq-worker-deployment.yaml

Repository: OpenMS/streamlit-template

Length of output: 2240

🏁 Script executed:

# Look for kustomization files that might override resources per tier find ./k8s -type f -name "kustomization.yaml" -o -name "kustomization.yml"

Repository: OpenMS/streamlit-template

Length of output: 243

🏁 Script executed:

# Check if there are overlays or patches for rq-worker in memory-tier directories find ./k8s/components/memory-tier-high -type f | head -20

Repository: OpenMS/streamlit-template

Length of output: 289

🏁 Script executed:

# Search for any references to 180Gi in the codebase rg "180" --type yaml --type md

Repository: OpenMS/streamlit-template

Length of output: 302

🏁 Script executed:

# Read the high-tier worker resources cat -n k8s/components/memory-tier-high/worker-resources.yaml

Repository: OpenMS/streamlit-template

Length of output: 486

🏁 Script executed:

# Check if low-tier has worker resources too find ./k8s/components/memory-tier-low -type f

Repository: OpenMS/streamlit-template

Length of output: 285

🏁 Script executed:

# Read the high-tier kustomization to understand how patches are applied cat -n k8s/components/memory-tier-high/kustomization.yaml

Repository: OpenMS/streamlit-template

Length of output: 489

🏁 Script executed:

# Read the low-tier worker resources to see the baseline cat -n k8s/components/memory-tier-low/worker-resources.yaml

Repository: OpenMS/streamlit-template

Length of output: 487

🏁 Script executed:

# Search for any README or documentation explaining the memory tier design find ./k8s -type f $ -name "README*" -o -name "*.md" $ | xargs grep -l "memory-tier" 2>/dev/null || echo "No matches found"

Repository: OpenMS/streamlit-template

Length of output: 86

Document the asymmetric scaling design for memory tiers.

The 2× memory increase for streamlit (2Gi → 4Gi) is correct, but the context reveals a significant asymmetry: rq-worker scales 11× (16Gi → 180Gi) while streamlit barely scales. For a component labeled "memory-tier-high," this design choice will likely confuse users — they may expect streamlit to also support memory-intensive workloads when switching tiers.

Add a comment to both streamlit-resources.yaml and worker-resources.yaml explaining that streamlit stays lightweight while rq-worker handles the compute scaling in high-tier deployments. This sets correct expectations and prevents users from expecting streamlit to accommodate large in-memory dataframes under higher tiers.

🧰 Tools

🪛 Checkov (3.2.524)

[medium] 1-16: Containers should not run with allowPrivilegeEscalation

(CKV_K8S_20)

[medium] 1-16: Minimize the admission of root containers

(CKV_K8S_23)

🪛 Trivy (0.69.3)

[error] 9-16: Root file system is not read-only

Container 'streamlit' of Deployment 'streamlit' should set 'securityContext.readOnlyRootFilesystem' to true

Rule: KSV-0014

Learn more

(IaC/Kubernetes)

[error] 9-16: Default security context configured

container streamlit in default namespace is using the default security context

Rule: KSV-0118

Learn more

(IaC/Kubernetes)

[error] 7-16: Default security context configured

deployment streamlit in default namespace is using the default security context, which allows root privileges

Rule: KSV-0118

Learn more

(IaC/Kubernetes)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@k8s/components/memory-tier-high/streamlit-resources.yaml` around lines 10 - 16, Add a short explanatory comment in both streamlit-resources.yaml and worker-resources.yaml near the resources block (the "resources: requests: memory/cpu limits: memory/cpu" section) stating that the high memory tier is intentionally asymmetric: streamlit remains lightweight (only increases from 2Gi→4Gi) and is not intended for large in-memory workloads, while rq-worker is the component that scales to handle heavy memory/compute (e.g., 16Gi→180Gi); place the comment directly above the resources block in each file and mention the rationale so users won't expect streamlit to serve large in-memory dataframes.

coderabbitai · 2026-04-24T10:03:53Z

+components:
+  - ../../components/memory-tier-low
+


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Inspect the component kustomization files to see how the nodeselector patch is targeted. fd -a 'kustomization.yaml' k8s/components echo '--- low ---' cat k8s/components/memory-tier-low/kustomization.yaml 2>/dev/null echo '--- high ---' cat k8s/components/memory-tier-high/kustomization.yaml 2>/dev/null

Repository: OpenMS/streamlit-template

Length of output: 863

🏁 Script executed:

#!/bin/bash # Check if Redis is a Deployment and inspect the nodeselector patch echo "=== Redis definition ===" cat k8s/base/redis.yaml 2>/dev/null | head -20 echo "" echo "=== Nodeselector patch content (low) ===" cat k8s/components/memory-tier-low/nodeselector.yaml 2>/dev/null echo "" echo "=== Base directory structure ===" ls -la k8s/base/ 2>/dev/null | grep -E '\.yaml$'

Repository: OpenMS/streamlit-template

Length of output: 1407

Node-selector patch will apply to Redis Deployment unintentionally.

The nodeselector.yaml patch in both memory-tier-low and memory-tier-high components targets kind: Deployment without filtering by name, so it will inject openms.de/memory-tier=low (or high) onto all Deployments in the base—including the Redis Deployment at k8s/base/redis.yaml.

Downstream effects:

Redis becomes unschedulable on any cluster where only app-workload nodes carry the memory-tier label.

Forks that switch to memory-tier-high will force Redis onto expensive high-memory nodes, despite Redis only requesting 256 Mi.

CI testing masks this because .github/workflows/build-and-test.yml labels all nodes with openms.de/memory-tier=low.

Fix: Narrow the nodeselector.yaml patch target in each component's kustomization.yaml to the two app Deployments only (e.g., via target.name: streamlit|rq-worker, or split into per-Deployment patches), so Redis is unaffected.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@k8s/overlays/prod/kustomization.yaml` around lines 7 - 9, The nodeselector patch in the memory-tier components is too broad and will match all Deployments (including the Redis Deployment); update each component's kustomization.yaml to restrict the nodeselector.yaml patch so it only targets the app Deployments (e.g., set patchStrategicMerge or patches with a target block referencing the patch file nodeselector.yaml and add target.kind: Deployment plus target.name: streamlit|rq-worker or create separate per-deployment patches with target.name: streamlit and target.name: rq-worker), ensuring the Redis Deployment (defined in the base as redis.yaml) is not modified by these components.

Previous run (2f28ed9) showed build + traefik-integration jobs still timing out on 'Wait for Redis'. Root cause: multi-node kind clusters apply node-role.kubernetes.io/control-plane:NoSchedule to the control-plane, which untolerated app pods can't land on even though the nodeSelector matches. The single-node kind used previously had no such taint, which is why CI worked until we added a second node. Add a kubeadmConfigPatches stanza setting nodeRegistration.taints to the empty list so the control-plane is schedulable. Labels and cluster shape (1 control-plane + 1 worker) stay the same. https://claude.ai/code/session_01LW4iBWt5YftuqFGc3jM5ZP

t0mdavid-m and others added 5 commits April 24, 2026 11:44

Merge branch 'main' into claude/parallel-webapp-memory-optimization-R…

11ff5cd

…oNnJ

Merge branch 'claude/parallel-webapp-memory-optimization-RoNnJ' of ht…

8abb90a

…tp://127.0.0.1:34609/git/OpenMS/streamlit-template into claude/parallel-webapp-memory-optimization-RoNnJ

coderabbitai Bot reviewed Apr 24, 2026

View reviewed changes

t0mdavid-m merged commit 64f43e2 into main Apr 24, 2026
9 checks passed

coderabbitai Bot mentioned this pull request Apr 27, 2026

Remove pod-affinity rules; rely on RWO PVC for co-location #380

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor K8s deployment to use memory-tier components#375

Refactor K8s deployment to use memory-tier components#375
t0mdavid-m merged 7 commits intomainfrom
claude/parallel-webapp-memory-optimization-RoNnJ

t0mdavid-m commented Apr 24, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 24, 2026 •

edited

Loading

Rate limit exceeded

Walkthrough

Changes

Possibly related PRs

Poem

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Apr 24, 2026

Uh oh!

coderabbitai Bot Apr 24, 2026

Uh oh!

coderabbitai Bot Apr 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		components:
		- ../../components/memory-tier-low

Conversation

t0mdavid-m commented Apr 24, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key Changes

Implementation Details

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Possibly related PRs

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

t0mdavid-m commented Apr 24, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 24, 2026 •

edited

Loading