Create an officially supported Helm chart for deploying Kubernetes onto existing Kubernetes infrastructure. Ensure Kubernetes deployments are suitable for enterprise deployments. Establish a plan for making OpenShell components ready for enterprise.
Overview
- Helm Chart
- Publish as OCI resource on ghcr.io
- Provide Ingress to OpenShell Gateway via Kubernetes Gateway API (this should be configurable so we can disable ingress)
- Control Plane (gateway) Authnz (keycloak to start)
- High Availability Gateway
- Supports horizontal scaling, rollouts, and client connection rebalancing
- Ensure postgres support is fully tested. (todo: can we use etcd or some other db solution)
- Configuring Sandboxes
- Specify memory, cpu, and other pod specs.
- Reduce need for privileged security capabilities (eg no more CAP_NET_ADMIN)
- OpenShell supervisor is injected onto the container via image volumes. (sap to contribute implementation)
- Observability (includes both gateway and sandbox health)
- Logs exporting
- Metrics (Prometheus, already landed)
- Dashboards (Grafana/similar)
- Resiliency
- Sandboxes are resilient to network disconnection between supervisor and openshell-server for Y period of time.
- Clarify heartbeat of supervisor to openshell-server, possibly some jitter in case of thundering herd scenarios. basic service-like heartbeats. scope here is monitoring tens of thousands of agents.
- Upgrading OpenShell
- Gateway data/schemas are migrated as necessary on upgrades
- Sandbox containers can be rolled with a new version of the supervisor. this needs to get controlled and configured for existing sandboxes.
- Sandbox containers themselves can be updated
- Unified gateway configuration file (see RFC 0002 - Gateway Configuration File)
- Kubernetes Operator
- We need a way to develop and test k8s features locally.
- K3s + skaffold or tilt or some other dev script
- Test infrastructure
Milestones
M1 - mvp
- Helm Chart
- Developer loop
- OpenShell server accepts a kube config and is decoupled from any k3s
- Documentation w/ rbac details for running inside an existing kube cluster
- parameterize e2e tests to point to any cluster
- Unified gateway config
- Initial documentation and deployment guidance available
M2 - reliability
- Test coverage on OpenShift and major Kubernetes distributions
- Security context privileges are dropped.
- Implement runtime class configs: kata and gvisor
- Gateway can horizontally scale
- Agent and operator friendly observability (gateway and sandbox)
M3 - operating OpenShell at scale
- OpenShell Kubernetes Operator
- Support upgrading/snapshotting sandboxes
Create an officially supported Helm chart for deploying Kubernetes onto existing Kubernetes infrastructure. Ensure Kubernetes deployments are suitable for enterprise deployments. Establish a plan for making OpenShell components ready for enterprise.
Overview
Milestones
M1 - mvp
M2 - reliability
M3 - operating OpenShell at scale