Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimization not progressing with Dask #638

Closed
AnnaGwen opened this issue Jun 10, 2022 · 14 comments
Closed

Optimization not progressing with Dask #638

AnnaGwen opened this issue Jun 10, 2022 · 14 comments
Labels
bug Something isn't working

Comments

@AnnaGwen
Copy link
Contributor

Environment

  • Covalent version: 0.33.0
  • Python version: 3.8.5
  • Operating system: MacOS Big Sur 11.6

What is happening?

Some optimization routines are not actually progressing as expected. In the below script, the cost function should be minimized over the course of 100 iterations, but when it gets dispatched using dask it only returns a constant value.
The same issue does not occur when it gets dispatched locally.

How can we reproduce the issue?

import pennylane as qml
from pennylane import qaoa
from pennylane import numpy as np
from matplotlib import pyplot as plt
import networkx as nx
import covalent as ct

from dask.distributed import LocalCluster
cluster=LocalCluster()
dask=ct.executor.DaskExecutor(scheduler_address=cluster.scheduler_address)

@ct.electron
def make_graph(qubits,prob):
    graph = nx.generators.random_graphs.gnp_random_graph(n=qubits,p=prob)
    cost_h, mixer_h = qaoa.min_vertex_cover(graph, constrained=False)
    return cost_h,mixer_h


@ct.electron
def get_circuit(cost_h,mixer_h):
    def qaoa_layer(gamma, alpha):
        qaoa.cost_layer(gamma, cost_h)
        qaoa.mixer_layer(alpha, mixer_h)
        

    def circuit(params,wires, **kwargs):
        depth=params.shape[1]
        for w in range(wires):
            qml.Hadamard(wires=w)
        qml.layer(qaoa_layer, depth, params[0], params[1])
    return circuit

@ct.electron
def make_cost_function(circuit,cost_h,qubits):
    dev = qml.device("lightning.qubit", wires=qubits)
    @qml.qnode(dev)
    def cost_function(params):
        circuit(params,wires=qubits)
        return qml.expval(cost_h)

    return cost_function


@ct.electron
def get_random_initialization(p=1,seed=0):
    np.random.seed(seed)
    return np.random.uniform(0,2*np.pi,(2,p),requires_grad=True)

@ct.electron
def initialize_parameters(p=1,qubits=2,prob=0.3,seed=1):
    cost_h,mixer_h=make_graph(qubits=qubits,prob=prob)
    circuit=get_circuit(cost_h,mixer_h)
    cost_function=make_cost_function(circuit,cost_h,qubits)
    initial_angles=get_random_initialization(p=p,seed=seed)
    return cost_function,initial_angles

@ct.electron
def calculate_cost(cost_function,params,optimizer):
    params,loss=optimizer.step_and_cost(cost_function, params)
    return optimizer,params,loss

@ct.electron
def optimize_electron(cost_function,init_angles, optimizer=qml.GradientDescentOptimizer(),iterations=10):
    loss_history=[]
    params = init_angles
    for _ in range(iterations):
        optimizer,params,loss=calculate_cost(cost_function=cost_function,params=params,optimizer=optimizer)
        loss_history.append(loss)
    return loss_history


@ct.electron
def collect_and_mean(array):
    return np.mean(array,axis=0)

@ct.lattice(executor=dask)
def run_exp(p=1,qubits=2,prob=0.3,seed=[1], optimizers=[qml.GradientDescentOptimizer()],iterations=10):
    compare_optimizers=[]
    for s in seed:
        tmp=[]
        cost_function,init_angles=initialize_parameters(p=p,qubits=qubits,prob=prob,seed=s)
        for optimizer in optimizers:
            loss_history=optimize_electron(cost_function=cost_function,init_angles=init_angles,optimizer=optimizer,iterations=iterations)
            tmp.append(loss_history)
        compare_optimizers.append(tmp)
    return collect_and_mean(compare_optimizers)

id=ct.dispatch(run_exp)(p=1,qubits=2,prob=0.3,seed=[1], optimizers=[qml.GradientDescentOptimizer()],iterations=100)
plt.plot(ct.get_result(id,wait=True).result.T)
plt.show()

optimization_issue

What should happen?

The below image is generated when the same code is ran without using dask.
optimization_working

Any suggestions?

No response

@AnnaGwen AnnaGwen added the bug Something isn't working label Jun 10, 2022
@jackbaker1001
Copy link
Contributor

Also, this bug (or related bug) seems to pre-exist in another tutorial. Please see https://covalent.readthedocs.io/en/latest/tutorials/machine_learning/mnist_classifier.html . Upon comparing code cells 11 and 14, you will see that cell 11 executes successfully (loss decreases) while cell 14 does not see loss decrease (flat-lines like the image above.)

Thanks for pointing this out @ruihao-li . Also @ruihao-li, could you please provide some detail on your failing case in the autoencoder tutorial here please?

@cjao
Copy link
Contributor

cjao commented Jun 10, 2022

I've also seen this with the Quantum Gravity tutorial (all executors).

@ruihao-li
Copy link
Contributor

Yes, like in the MNIST tutorial, I was also training a neural net with PyTorch and I encountered this issue with both the local executor and the Dask executor (see PR #639). I was getting a more or less constant training loss with Covalent. But the training worked fine without using Covalent.
I will do some more investigations and post any update if I have any.

@cjao
Copy link
Contributor

cjao commented Jun 10, 2022

The loss is flat because the gradients computed in optimizer.step_and_cost() are empty. Pennylane is using autograd to compute the gradient function, and somehow that fails.

A possibly related discussion: dask/distributed#2581 (comment)

@jackbaker1001
Copy link
Contributor

So @cjao does this work if you choose a derivative free optimizer? (Try Powell or something)

@cjao
Copy link
Contributor

cjao commented Jun 12, 2022

The key to this case lies in the following line:

cost_function,init_angles=initialize_parameters(p=p,qubits=qubits,prob=prob,seed=s)

When the initialize_parameters electron is executed using Dask, the returned init_angles is merely a numpy.ndarray even though initialize_parameters constructs the initial angles

    initial_angles=get_random_initialization(p=p,seed=seed)

as a pennylane.numpy.tensor.tensor. Pennylane Tensors have a requires_grad attribute whereas ordinary numpy arrays don't, and Pennylane's gradient descent only computes the gradient for trainable parameters.

This problem doesn't occur if the initialize_parameters electron uses the local executor, regardless of whether optimize_electron is run locally or in a Dask worker.

The crux of the issue seems to be that the serialization and deserialization procedures used by Dask to pass data between worker processes somehow converts pennylane.numpy.tensor.tensor into a numpy.ndarray. The following snippet

import pennylane as qml
import pennylane.numpy as np
from dask.distributed import Client, LocalCluster
cluster = LocalCluster()
client = Client(cluster.scheduler_address)

def make_random(p=1):
    return qml.numpy.random.uniform(0,2*np.pi,(2,p),requires_grad=True)

fut = client.submit(make_random)
print(type(fut.result()))

returns <class 'numpy.ndarray'> for me.

In fact

from dask.distributed.protocol import serialize, deserialize

res = qml.numpy.random.uniform(0,2*np.pi,(2,1),requires_grad=True)
res1 = deserialize(*serialize(res))
print(type(res1))

yields <class 'numpy.ndarray'>.

According to the API docs, Dask defaults to a custom (de-)serialization protocol but supports others, including "msgpack" and "pickle". Choosing "pickle" seems to help:

from dask.distributed.protocol import serialize, deserialize

res = qml.numpy.random.uniform(0,2*np.pi,(2,1),requires_grad=True)
res1 = deserialize(*serialize(res, serializers=["pickle"]))
print(type(res1))

yields the much more promising <class 'pennylane.numpy.tensor.tensor'>.

We can set the protocol on a per-client basis.

Setting serializers=["pickle"] when constructing the Dask client seems to resolve this particular issue.

A possibly related Dask issue: dask/distributed#3716

@cjao
Copy link
Contributor

cjao commented Jun 12, 2022

I wonder which of the other training problems (not necessarily using Dask) can also be traced to serialization/deserialization errors.

@cjao
Copy link
Contributor

cjao commented Jun 12, 2022

@santoshkumarradha

@santoshkumarradha
Copy link
Member

Good catch @cjao , but from what a I remember when I was trying out with PyTorch optimization (a simple optimization) things did work. A temp workaround might be to set the pennylane packets to PyTorch and see @jackbaker1001.

@santoshkumarradha
Copy link
Member

Meanwhile @cjao, can we switch the dask serialization to cloud-pickle ? I do know cloudpickle enables gradient propagation.

@cjao
Copy link
Contributor

cjao commented Jun 13, 2022

The MNIST bug (which uses torch) has a different explanation. Let's discuss in a separate issue.

@ruihao-li
Copy link
Contributor

Hey all, I have created another issue related to this when using PyTorch here.

@cjao
Copy link
Contributor

cjao commented Jun 13, 2022

Until #648 is addressed, a temporary fix is to use executor=local for the initialize_parameters electron.

@cjao
Copy link
Contributor

cjao commented Jul 3, 2022

Potentailly addreseed in #748

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants