## Remote-mQPU:

<div>
<img src="./remote-mqpu.png" width="700">
</div>

the multi-QPU NVIDIA platform enables multi-QPU distribution whereby each QPU is simulated by a single NVIDIA GPU. To run multi-QPU workloads on different simulator backends, one can use the remote-mqpu platform, which encapsulates simulated QPUs as independent HTTP REST server instances.

By default, auto launching daemon services do not support MPI parallelism. Hence, using the nvidia-mgpu backend to simulate each virtual QPU requires manually launching each server instance. 

``` python

CUDA_VISIBLE_DEVICES=0,1 mpiexec -np 2 python3 cudaq-qpud --port <QPU 1 TCP/IP port number>
CUDA_VISIBLE_DEVICES=2,3 mpiexec -np 2 python3 cudaq-qpud --port <QPU 2 TCP/IP port number>
CUDA_VISIBLE_DEVICES=4,5 mpiexec -np 2 python3 cudaq-qpud --port <QPU 3 TCP/IP port number>
CUDA_VISIBLE_DEVICES=6,7 mpiexec -np 2 python3 cudaq-qpud --port <QPU 4 TCP/IP port number>
```

```<QPU n TCP/IP port number>```: The network ports just need to be available, so you can pick random numbers between ~1100 and ~49000.

Note: If the port is unavailable, it will report an error saying something like "failed to bind to port"

Note: When you are done with the servers, you need to manually kill them.

``` python

import cudaq
from cudaq import spin
import numpy as np

np.random.seed(1)

backend = 'nvidia-mgpu'
servers = "localhost:30001,localhost:30002"

# Set the target to execute on and query the number of QPUs in the system;
# The number of QPUs is equal to the number of (auto-)launched server instances.
cudaq.set_target("remote-mqpu",
                    backend=backend,
                    auto_launch=str(servers) if servers.isdigit() else "",
                    url="" if servers.isdigit() else servers)
qpu_count = cudaq.get_target().num_qpus()
print("Number of virtual QPUs:", qpu_count)

qubit_count = 30
sample_count = 2

ham = spin.z(0)

parameter_count = qubit_count


# Below we run a circuit for 500 different input parameters.
parameters = np.random.default_rng(13).uniform(low=0,high=1,size=(sample_count,parameter_count))

print('Parameter shape: ', parameters.shape)

@cudaq.kernel
def kernel_rx(theta:list[float]):
    qubits = cudaq.qvector(qubit_count)

    for i in range(qubit_count):
        rx(theta[i], qubits)

# Multi-GPU

# We split our parameters into 2 arrays 
xi = np.split(parameters,2)

print('We have', parameters.shape[0],
      'parameters which we would like to execute')

print('We split this into', len(xi), 'batches of', xi[0].shape[0], ',',
      xi[1].shape[0])


print('Shape after splitting', xi[0].shape)
asyncresults = []


for i in range(len(xi)):
    for j in range(xi[i].shape[0]):
        asyncresults.append(
            cudaq.observe_async(kernel_rx, ham, xi[i][j, :], qpu_id=i))


print('Energies from multi-GPUs')
for result in asyncresults:
    observe_result = result.get()
    got_expectation = observe_result.expectation()
    print(got_expectation)

```

To execute this job, use:

``` python
CUDA_VISIBLE_DEVICES=0,1 mpiexec -np 2 cudaq-qpud --port 30001 &

CUDA_VISIBLE_DEVICES=2,3 mpiexec -np 2 cudaq-qpud --port 30002 &

python3 observe-qml-remote-mqpu.py

```

Output:

``` python

Number of virtual QPUs: 2
Parameter shape:  (2, 30)
We have 2 parameters which we would like to execute
We split this into 2 batches of 1 , 1
Shape after splitting (1, 30)
Energies from multi-GPUs
-0.17012869483974752
-0.9906678983711331

```

``` python

cudaq.set_target("remote-mqpu",
                    backend=backend,
                    auto_launch=str(servers) if servers.isdigit() else "",
                    url="" if servers.isdigit() else servers)

```


If the use case is simple enough, then the CUDA-Q runtime can auto-launch the server for you. For example, if servers is '2', then auto_launch='2' and url=''. The CUDA-Q runtime will interpret that to mean that it should auto-launch 2 servers and it will auto-pick the ports, so the user doesn't need to specify the URL.

But if servers is a comma-separated list of servers (like 'localhost:30001,localhost:30002' ), then auto_launch='' and url='localhost:30001,localhost:30002' . That means the CUDA-Q runtime WON'T auto launch the servers and will use the provided url to connect to when submitting programs to the backend.