[Feature][QDP] Add MPI-ready distributed amplitude execution scaffolding#1296
[Feature][QDP] Add MPI-ready distributed amplitude execution scaffolding#1296viiccwen wants to merge 8 commits intoapache:mainfrom
Conversation
|
Nice one, will probably take a look on Thursday. |
ryankert01
left a comment
There was a problem hiding this comment.
Hi, nice initiative. Curious of can it be plugging in our current lighting.gpu (penny lane) workflow?
mahout.qdp -> lighting.gpu (by zero copy)
4145efa to
baa6a90
Compare
|
@ryankert01, after researching, I think the main gap is on the From what I can tell, the missing piece is not “can Mahout produce a GPU-resident state?” but “can So my current estimate would be:
So yes: for a proper zero-copy integration, I would expect roughly 3 PRs for a narrow MVP. |
|
@viiccwen I think since it's 3 PRs away and so big. Can we postpone it to the next release? It will be more mature at next release and we can think about its detail. |
Sure, it'll be fine. |
Related Issues
Closed #1295
Changes
DistributedExecutionContextso distributed execution is driven by a bundled mesh-plus-collectives object rather than ad hoc device and collective parametersCollectiveCommunicatorseam and an in-processLocalCollectiveCommunicatorimplementation for the current single-process pathdevice_idvalues so shard metadata, device handles, and active device context stay alignedWhy
How
distributed_multigpu_q34_probeexamplesequenceDiagram participant Caller as QdpEngine caller participant Engine as QdpEngine participant Mesh as DeviceMesh participant Planner as PlacementPlanner participant Ctx as DistributedExecutionContext participant Runtime as distributed runtime participant Comm as LocalCollectiveCommunicator Caller->>Engine: encode distributed amplitude request Engine->>Engine: validate input and resolve request Engine->>Mesh: build distributed mesh Engine->>Ctx: construct execution context Engine->>Planner: build placement plan Planner-->>Engine: placement + shard ranges Engine->>Runtime: execute distributed encode Runtime->>Runtime: bind planned device handles Runtime->>Comm: reduce local norm contributions Comm-->>Runtime: global norm Runtime-->>Engine: DistributedStateVector Engine-->>Caller: sharded distributed state