Optimization not progressing with Dask #638

AnnaGwen · 2022-06-10T19:42:35Z

Environment

Covalent version: 0.33.0
Python version: 3.8.5
Operating system: MacOS Big Sur 11.6

What is happening?

Some optimization routines are not actually progressing as expected. In the below script, the cost function should be minimized over the course of 100 iterations, but when it gets dispatched using dask it only returns a constant value.
The same issue does not occur when it gets dispatched locally.

How can we reproduce the issue?

import pennylane as qml
from pennylane import qaoa
from pennylane import numpy as np
from matplotlib import pyplot as plt
import networkx as nx
import covalent as ct

from dask.distributed import LocalCluster
cluster=LocalCluster()
dask=ct.executor.DaskExecutor(scheduler_address=cluster.scheduler_address)

@ct.electron
def make_graph(qubits,prob):
    graph = nx.generators.random_graphs.gnp_random_graph(n=qubits,p=prob)
    cost_h, mixer_h = qaoa.min_vertex_cover(graph, constrained=False)
    return cost_h,mixer_h


@ct.electron
def get_circuit(cost_h,mixer_h):
    def qaoa_layer(gamma, alpha):
        qaoa.cost_layer(gamma, cost_h)
        qaoa.mixer_layer(alpha, mixer_h)
        

    def circuit(params,wires, **kwargs):
        depth=params.shape[1]
        for w in range(wires):
            qml.Hadamard(wires=w)
        qml.layer(qaoa_layer, depth, params[0], params[1])
    return circuit

@ct.electron
def make_cost_function(circuit,cost_h,qubits):
    dev = qml.device("lightning.qubit", wires=qubits)
    @qml.qnode(dev)
    def cost_function(params):
        circuit(params,wires=qubits)
        return qml.expval(cost_h)

    return cost_function


@ct.electron
def get_random_initialization(p=1,seed=0):
    np.random.seed(seed)
    return np.random.uniform(0,2*np.pi,(2,p),requires_grad=True)

@ct.electron
def initialize_parameters(p=1,qubits=2,prob=0.3,seed=1):
    cost_h,mixer_h=make_graph(qubits=qubits,prob=prob)
    circuit=get_circuit(cost_h,mixer_h)
    cost_function=make_cost_function(circuit,cost_h,qubits)
    initial_angles=get_random_initialization(p=p,seed=seed)
    return cost_function,initial_angles

@ct.electron
def calculate_cost(cost_function,params,optimizer):
    params,loss=optimizer.step_and_cost(cost_function, params)
    return optimizer,params,loss

@ct.electron
def optimize_electron(cost_function,init_angles, optimizer=qml.GradientDescentOptimizer(),iterations=10):
    loss_history=[]
    params = init_angles
    for _ in range(iterations):
        optimizer,params,loss=calculate_cost(cost_function=cost_function,params=params,optimizer=optimizer)
        loss_history.append(loss)
    return loss_history


@ct.electron
def collect_and_mean(array):
    return np.mean(array,axis=0)

@ct.lattice(executor=dask)
def run_exp(p=1,qubits=2,prob=0.3,seed=[1], optimizers=[qml.GradientDescentOptimizer()],iterations=10):
    compare_optimizers=[]
    for s in seed:
        tmp=[]
        cost_function,init_angles=initialize_parameters(p=p,qubits=qubits,prob=prob,seed=s)
        for optimizer in optimizers:
            loss_history=optimize_electron(cost_function=cost_function,init_angles=init_angles,optimizer=optimizer,iterations=iterations)
            tmp.append(loss_history)
        compare_optimizers.append(tmp)
    return collect_and_mean(compare_optimizers)

id=ct.dispatch(run_exp)(p=1,qubits=2,prob=0.3,seed=[1], optimizers=[qml.GradientDescentOptimizer()],iterations=100)
plt.plot(ct.get_result(id,wait=True).result.T)
plt.show()

What should happen?

The below image is generated when the same code is ran without using dask.

Any suggestions?

No response

The text was updated successfully, but these errors were encountered:

jackbaker1001 · 2022-06-10T19:57:15Z

Also, this bug (or related bug) seems to pre-exist in another tutorial. Please see https://covalent.readthedocs.io/en/latest/tutorials/machine_learning/mnist_classifier.html . Upon comparing code cells 11 and 14, you will see that cell 11 executes successfully (loss decreases) while cell 14 does not see loss decrease (flat-lines like the image above.)

Thanks for pointing this out @ruihao-li . Also @ruihao-li, could you please provide some detail on your failing case in the autoencoder tutorial here please?

cjao · 2022-06-10T20:13:40Z

I've also seen this with the Quantum Gravity tutorial (all executors).

ruihao-li · 2022-06-10T20:27:29Z

Yes, like in the MNIST tutorial, I was also training a neural net with PyTorch and I encountered this issue with both the local executor and the Dask executor (see PR #639). I was getting a more or less constant training loss with Covalent. But the training worked fine without using Covalent.
I will do some more investigations and post any update if I have any.

cjao · 2022-06-10T22:43:11Z

The loss is flat because the gradients computed in optimizer.step_and_cost() are empty. Pennylane is using autograd to compute the gradient function, and somehow that fails.

A possibly related discussion: dask/distributed#2581 (comment)

jackbaker1001 · 2022-06-10T22:45:37Z

So @cjao does this work if you choose a derivative free optimizer? (Try Powell or something)

cjao · 2022-06-12T12:29:06Z

The key to this case lies in the following line:

cost_function,init_angles=initialize_parameters(p=p,qubits=qubits,prob=prob,seed=s)

When the initialize_parameters electron is executed using Dask, the returned init_angles is merely a numpy.ndarray even though initialize_parameters constructs the initial angles

    initial_angles=get_random_initialization(p=p,seed=seed)

as a pennylane.numpy.tensor.tensor. Pennylane Tensors have a requires_grad attribute whereas ordinary numpy arrays don't, and Pennylane's gradient descent only computes the gradient for trainable parameters.

This problem doesn't occur if the initialize_parameters electron uses the local executor, regardless of whether optimize_electron is run locally or in a Dask worker.

The crux of the issue seems to be that the serialization and deserialization procedures used by Dask to pass data between worker processes somehow converts pennylane.numpy.tensor.tensor into a numpy.ndarray. The following snippet

import pennylane as qml
import pennylane.numpy as np
from dask.distributed import Client, LocalCluster
cluster = LocalCluster()
client = Client(cluster.scheduler_address)

def make_random(p=1):
    return qml.numpy.random.uniform(0,2*np.pi,(2,p),requires_grad=True)

fut = client.submit(make_random)
print(type(fut.result()))

returns <class 'numpy.ndarray'> for me.

In fact

from dask.distributed.protocol import serialize, deserialize

res = qml.numpy.random.uniform(0,2*np.pi,(2,1),requires_grad=True)
res1 = deserialize(*serialize(res))
print(type(res1))

yields <class 'numpy.ndarray'>.

According to the API docs, Dask defaults to a custom (de-)serialization protocol but supports others, including "msgpack" and "pickle". Choosing "pickle" seems to help:

from dask.distributed.protocol import serialize, deserialize

res = qml.numpy.random.uniform(0,2*np.pi,(2,1),requires_grad=True)
res1 = deserialize(*serialize(res, serializers=["pickle"]))
print(type(res1))

yields the much more promising <class 'pennylane.numpy.tensor.tensor'>.

We can set the protocol on a per-client basis.

Setting serializers=["pickle"] when constructing the Dask client seems to resolve this particular issue.

A possibly related Dask issue: dask/distributed#3716

cjao · 2022-06-12T12:43:53Z

I wonder which of the other training problems (not necessarily using Dask) can also be traced to serialization/deserialization errors.

cjao · 2022-06-12T12:47:35Z

@santoshkumarradha

santoshkumarradha · 2022-06-13T12:12:09Z

Good catch @cjao , but from what a I remember when I was trying out with PyTorch optimization (a simple optimization) things did work. A temp workaround might be to set the pennylane packets to PyTorch and see @jackbaker1001.

santoshkumarradha · 2022-06-13T12:13:24Z

Meanwhile @cjao, can we switch the dask serialization to cloud-pickle ? I do know cloudpickle enables gradient propagation.

cjao · 2022-06-13T13:10:03Z

The MNIST bug (which uses torch) has a different explanation. Let's discuss in a separate issue.

ruihao-li · 2022-06-13T18:49:18Z

Hey all, I have created another issue related to this when using PyTorch here.

cjao · 2022-06-13T21:45:03Z

Until #648 is addressed, a temporary fix is to use executor=local for the initialize_parameters electron.

cjao · 2022-07-03T21:50:00Z

Potentailly addreseed in #748

AnnaGwen added the bug Something isn't working label Jun 10, 2022

ruihao-li mentioned this issue Jun 13, 2022

Optimization not progressing with PyTorch #645

Closed

This was referenced Jun 13, 2022

MNIST tutorial fails to train correctly when using Covalent #646

Closed

Consider forcing "pickle" serialization protcol for Dask executor #648

Closed

This was referenced Jun 15, 2022

Sublattices should inherit the executor from the parent lattice if none is provided #417

Closed

Passing the executor="local" parameter to an electron when running Covalent with Dask does not result in local execution #699

Closed

cjao mentioned this issue Jul 3, 2022

Covalent dispatcher requires all of a workflow's package dependencies. #748

Closed

santoshkumarradha closed this as completed Aug 31, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimization not progressing with Dask #638

Optimization not progressing with Dask #638

AnnaGwen commented Jun 10, 2022

jackbaker1001 commented Jun 10, 2022

cjao commented Jun 10, 2022

ruihao-li commented Jun 10, 2022

cjao commented Jun 10, 2022

jackbaker1001 commented Jun 10, 2022

cjao commented Jun 12, 2022 •

edited

Loading

cjao commented Jun 12, 2022

cjao commented Jun 12, 2022

santoshkumarradha commented Jun 13, 2022

santoshkumarradha commented Jun 13, 2022

cjao commented Jun 13, 2022

ruihao-li commented Jun 13, 2022

cjao commented Jun 13, 2022

cjao commented Jul 3, 2022

Optimization not progressing with Dask #638

Optimization not progressing with Dask #638

Comments

AnnaGwen commented Jun 10, 2022

Environment

What is happening?

How can we reproduce the issue?

What should happen?

Any suggestions?

jackbaker1001 commented Jun 10, 2022

cjao commented Jun 10, 2022

ruihao-li commented Jun 10, 2022

cjao commented Jun 10, 2022

jackbaker1001 commented Jun 10, 2022

cjao commented Jun 12, 2022 • edited Loading

cjao commented Jun 12, 2022

cjao commented Jun 12, 2022

santoshkumarradha commented Jun 13, 2022

santoshkumarradha commented Jun 13, 2022

cjao commented Jun 13, 2022

ruihao-li commented Jun 13, 2022

cjao commented Jun 13, 2022

cjao commented Jul 3, 2022

cjao commented Jun 12, 2022 •

edited

Loading