RFC: BOSH-Provided Dynamic Disks via Volume Services#1453
RFC: BOSH-Provided Dynamic Disks via Volume Services#1453rkoster wants to merge 2 commits intocloudfoundry:mainfrom
Conversation
|
So, this is clearly related to #1401. As referenced in this RFC:
Is this intended as an extension of that RFC? or an alternative? |
|
It is unexpected to see a new RFC from a new author rather than questions or suggestions on the original one - #1401. We discussed the approach at the meeting and majority agreed on it. I will reiterate the points we covered at the meeting. 1. Security: IaaS operations should not originate from workload VMsThis RFC proposes that Diego cells initiate disk create, attach, detach, and delete operations via Agent commands over NATS. Diego cells run untrusted tenant workloads. Today they have no ability to trigger IaaS resource manipulation. This proposal gives them that ability which is not secure. In the original RFC, all IaaS disk operations are initiated from the control plane. Workload VMs are not part of the disk management request path. 2. Decision initiation: Director vs. AgentThe Agent already sends messages to the Director over NATS (heartbeats, etc.), but those are status reports. The Director is the decision-maker. This RFC changes that. The Agent becomes the initiator of IaaS-mutating decisions: "create this disk," "attach this disk to that VM." The Director becomes an execution backend. In the original RFC, the Director remains the decision-maker. Control-plane clients call the Director HTTP API, the Director decides how and when to execute CPI operations. The Agent's role does not change. 3. Different authorization mechanismsThe original RFC adds HTTP endpoints to the Director using existing authentication (UAA) and existing request patterns. The Agent gets Director-initiated instructions to resolve device symlinks — same kind of thing it already does. This RFC introduces:
4. Incorrect credential sprawl characterizationAppendix A states the Director HTTP API approach as requiring UAA credentials on every Diego cell. That assumes the volume driver on each cell calls the Director directly. That is not true. A centralized control-plane component can hold Director credentials with disk management scope only and make API calls. No credentials on workload VMs. 5. ReconciliationThe Director HTTP API supports centralized reconciliation: a controller observes desired state from BBS, compares to actual disk state, and converges. Retries, race handling, and edge cases are managed in one place. This RFC only allows retry logic in the volume driver which is not that robust, e.g.
6. "Zero Diego changes"This RFC states "zero changes to Diego, CAPI, or the CF CLI" as a design advantage. The original RFC does not propose any changes to Diego or CAPI either — it is scoped entirely to BOSH. Also, this RFC's own Future Work section proposes |
It is a formalisation of an alternative implementation that was discussed in the working group. Also it is taking on a bigger scope to propose a clear path of how to adopt this bosh feature in the context of diego and CF. |
|
The integration with the Volume driver to add support into Cloud Foundry would be a very large win for those of us who do not have access to SMB/NFS solutions on AWS. I reread #1401, it only seems to address adding dynamic disk to BOSH for a kubernetes project that does not appear to be open sourced. While that isn't a problem itself, if there is an alternative solution which can benefit Cloud Foundry directly, that would give this proposal quite a bit of weight. I also appreciate that the #1453 solution proposes:
|
- Add Layer 4: Diego Changes section
- Scope: private for exclusive-access volumes (stop-first evacuation)
- InstanceIndexedVolumeIds spec flag (stable volume IDs across restarts)
- These are independent: private scope triggers evacuation change,
InstanceIndexedVolumeIds triggers volume ID suffix change
- Rewrite Summary with security-first value proposition
- No credential distribution (Agent relay, not API credentials)
- Per-instance blast radius (no lateral movement)
- Manifest-driven permissions (easy onboarding)
- Rewrite Problem section with integration challenge framing
- Who holds IaaS credentials?
- How are operations authorized?
- What's the blast radius of compromise?
- Expand Security Model section with detailed explanations
- Update Appendix D to acknowledge remote drivers are supported
- Centralized proxy is technically feasible
- Trade-offs: SPOF, per-cell auth still needed, complexity
- Update Future Work: Disk Sets to reference Layer 4
|
@cweibel The #1401 does use volume services similar to NFS and SMB. The difference is that volume driver responsibility is to only to mount/unmount disk for container similar to NFS/SMB, compared to this RFC where volume driver in addition is responsible for attaching and detaching disks to VM, which in the original RFC will be responsibility of a separate component deployed separately from the user workload. It watches the BBS events and attaches, detaches disks by issuing HTTP API calls to Director. Having disk orchestration is better done in a centralized controller, that has a whole picture of what is happening with LRPs instead of distributed model where volume driver is only aware of the current LRP. If volume driver fails to detach for any reason, who will be responsible for cleaning up and reconciling the disk state? What will happen if Director becomes unavailable? That is why Cloud Foundry always had a centralized controller like BBS responsible for reconciliation. In case of #1401 we have:
As you see, 1401 model is actually closer to NFS/SMB. NFS/SMB were never responsible for creating/attaching disks, they operated with existing disks. In fact, the code from NFS/SMB service broker and volume driver can be completely reused for #1401 SMB/NFS share the same service broker and volume driver code and same can be done with dynamic disks. |
|
While the end goal of #1401 may be to provide dynamic disk to a closed source solution (doesn't bother me), #1453 addresses provisioning dynamic disk and presenting them to application instances inside of a CF application. The end goals are very much different, but overlap on the requirements to have BOSH manage the life cycle of allocating disks means that I would like these two to not block each other. From what I read of #1401 there is no mention of a service broker or any other orchestration outside of the additional BOSH director API endpoints to handle dynamic disk. For me, this makes it hard to evaluate where this conflicts with adding support for persistent volumes inside of Cloud Foundry, which I would very much like to have. |
|
To summarize this RFC key design points:
|
|
@cweibel Updated RFC with the Cloud Foundry use case https://github.com/mariash/community/blob/dynamic-disks/toc/rfc/rfc-draft-dynamic-disks.md#cloud-foundry-integration Please let me know if this will cover your concern. |
|
@cweibel to address the rest of your concerns:
Thank you for this concern and @beyhan also mentioned that, I added an explicit section "Authorization model" to cover how current model with UAA scopes will be used to limit access to disk operations - https://github.com/mariash/community/blob/dynamic-disks/toc/rfc/rfc-draft-dynamic-disks.md#authorization-model @rkoster I suggest having a separate RFC for the Director permissions model you proposed in this RFC.
Could you please specify why NATS communication is preffered over an existing Director HTTP API model? The main concern I have is security, It opens a back door on Diego cells to issue IAAS commands to Director.
This actually was covered in the original RFC under VM lock, how it provides coordination between VM lifecycle and disk management operations - https://github.com/mariash/community/blob/dynamic-disks/toc/rfc/rfc-draft-dynamic-disks.md#vm-lock
Hopefully, CF use case I added to RFC will cover this. The difference in RFCs are the communication channel (NATs vs HTTP). Calling bosh-agent from volume driver is not secure since this is where workloads are running. Having HTTP API would allow only authorized users from separate VMs to perform requests. Volume driver can even issue these requests, although I recommend not open a path from workload VMs and do it on a control plane VM. Which will be possible with HTTP API and UAA scopes. |
This PR adds the RFC "BOSH-Provided Dynamic Disks via Volume Services".
For easier viewing, you can see the full RFC as preview.
Summary
This RFC proposes a mechanism for BOSH to provide IaaS-managed persistent disks to Diego containers through the existing volume services architecture.
permissionsmodel controlling which instance groups may create, attach, detach, and delete disksDiego's volman discovers this driver automatically — zero changes to Diego, CAPI, or the CF CLI.
This is a foundation technology enabling Cloud Foundry to run stateful single-container workloads — agentic coding sessions, cloud-based developer environments, and long-running AI agent processes.
Key Design Points
bosh deploy