Skip to content

Optimize ephemeral volume storage capacity for concurrent workspaces #1

@blink-so

Description

@blink-so

Problem

During the Sept 30 Agentic Workshop, limited node storage capacity for ephemeral volumes caused workspaces to die and restart when ~10 users simultaneously ran workloads. Self-healing mechanisms wiped user progress.

Root Cause

Node storage capacity planning did not account for concurrent workspace workloads competing for ephemeral volume resources.

Requirements

  • Audit current ephemeral volume allocation per node
  • Calculate storage requirements for target concurrent workspace count
  • Implement storage capacity monitoring and alerting
  • Define resource limits per workspace to prevent single workspace from exhausting node storage
  • Test with realistic concurrent user load (use monthly workshops for validation)
  • Document storage capacity planning methodology

Success Criteria

  • Support 20+ concurrent workspaces without storage contention
  • Alerting triggers before reaching critical storage thresholds
  • No workspace restarts due to storage issues during monthly workshops

Related

Sept 30 Workshop Postmortem

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions