# Molecular docking via DC-QAOA

Drugs often work by binding to an active site of a protein, inhibiting or activating its function for some therapeutic purpose. Finding new candidate drugs is extremely difficult. The study of molecular docking helps guide this search and involves the prediction of how strongly a certain ligand (drug) will bind to its target (usually a protein).  

One of the primary challenges to molecular docking arises from the many geometric degrees of freedom present in proteins and ligands, making it difficult to predict the optimal orientation and assess if the drug is a good candidate or not. One solution is to formulate the problem as a mathematical optimization problem where the optimal solution corresponds to the most likely ligand-protein configuration. This optimization problem can be solved on a quantum computer using methods like the Quantum Approximate Optimization Algorithm (QAOA). This tutorial demonstrates how this [paper](https://arxiv.org/pdf/2308.04098) used digitized-counteradiabatic (DC) QAOA to study molecular docking.  This tutorial assumes you have an understanding of QAOA, if not, please see the CUDA-Q MaxCut tutorial found [here](https://nvidia.github.io/cuda-quantum/latest/applications/python/qaoa.html).

The next section provides more detail on the problem setup followed by CUDA-Q implementations below.

### Setting up the Molecular Docking Problem

The figure from the [paper](https://arxiv.org/pdf/2308.04098) provides a helpful diagram for understanding the workflow.

![docking](./images/docking.png)


There are 6 key steps:
1.  The experimental protein and ligand structures are determined and used to select pharmacores, or an important chemical group that will govern the chemical interactions.
2. Two labeled distance graphs (LAGs) of size $N$ and $M$ represent the protein and the ligand, respectively. Each node corresponds to a pharmacore and each edge weight corresponds to the distance between pharmacores.
3.  A $M*N$ node binding interaction graph (BIG) is created from the LAGs. Each node in the BIG graph corresponds to a pair of pharmacores, one from the ligand and the other from the protein. The existence of edges between nodes in the BIG graph are determined from the LAGs and correspond to interactions that can feesibly coexist. Therefore, cliques in the graph correspond to mutually possible interactions. 
4. The problem is mapped to a QAOA circuit and corresponding Hamiltonian. From there, the ground state solution is determined.
5.  The ground state will produce the maximum weighted clique which corresponds to the best (most strongly bound) orientation of the ligand and protein.
6.  The predicted docking structure is interpreted from the QAOA result and is used for further analysis.


### CUDA-Q Implementation

First, the appropriate libraries are imported and the `nvidia` backend is selected to run on GPUs if available. This notebook makes use of the [CUDA-Q Solvers library](https://nvidia.github.io/cudaqx/components/solvers/introduction.html) to streamline the workflow. If you would like to see afully worked out example, see the Max Cut QAOA example in the applications section of the CUDA-Q Docs.

In [7]:
import cudaq
from cudaq import spin
import numpy as np
import networkx as nx
# libgfortran is a dependency of solvers, please make sure it is installed on your system first.
!pip install cudaq-solvers -q
import cudaq_solvers as solvers

The block below defines two of the BIG data sets from the paper. The first is a smaller example, but it can be swapped with the commented out example below at your discretion. The weights are specified for each node based on the nature of the ligand and protein pharmacores represented by the node. A graph is built from this information using the `networkx` package. 

In [9]:
# The two graph inputs from the paper

# BIG 1 (Smaller Simulation)
G = nx.Graph()
edges = [[0, 1], [0, 2], [0, 4], [0, 5], [1, 2], [1, 3], [1, 5], [2, 3], [2, 4],
         [3, 4], [3, 5], [4, 5]]

weights = [0.6686, 0.6686, 0.6686, 0.1453, 0.1453, 0.1453]

for i, weight in enumerate(weights):
    G.add_node(i, weight=weight)
G.add_edges_from(edges)

penalty = 6.0
num_layers = 3

# BIG 2 (More expensive simulation)
#nodes=[0,1,2,3,4,5,6,7]
#edges=[[0,1],[0,2],[0,5],[0,6],[0,7],[1,2],[1,4],[1,6],[1,7],[2,4],[2,5],[2,7],[3,4],[3,5],[3,6],\
#    [4,5],[4,6],[5,6]]
#weights=[0.6686,0.6686,0.6886,0.1091,0.0770,0.0770,0.0770,0.0770]
#penalty=8.0
#num_layers=8

Next, the Hamiltonian is constructed: 

$$H = \frac{1}{2}\sum_{i \in V}w_i(\sigma^z_i - 1) + \frac{P}{4} \sum_{(i,j) \notin E, i \neq j} (\sigma^z_i -1)(\sigma^z_j - 1) $$


The first term concerns the vertices and the weights of the given pharmacores.  The second term is a penalty term that penalizes edges of the graph with no interactions.  The penalty $P$ is set by the user and is defined as 6 in the cell above. Solvers handles Hamiltonian construction so all you need to do is input the graph and penalty into `get_clique_hamiltonian` to produce a spin operator corresponding to the problem Hamiltonian.

In [11]:
H = solvers.get_clique_hamiltonian(G, penalty=penalty)
print(H)

[1.5+0j] IIZIIZ
[-1.1657+0j] IZIIII
[-1.1657+0j] IIZIII
[-1.42735+0j] IIIIIZ
[-1.1657+0j] ZIIIII
[3.2791499999999996+0j] IIIIII
[-1.42735+0j] IIIZII
[1.5+0j] ZIIZII
[1.5+0j] IZIIZI
[-1.42735+0j] IIIIZI



The next cell determines the parameter count and initial parameters for a QAOA kernel.  The difference between standard QAOA and DC-QAOA is the inclusion of additional counteradiabatic terms to better drive the optimization to the ground state. These terms are digitized and applied as additional operations following each QAOA layer.  The increase in parameters is hopefully offset by requiring fewer layers. In this example, the DC terms are the additional parameterized $Y$ operations applied to each qubit.

Setting `counteradiabatic=True` automatically adjusts the parameter count and the quantum circuit used. Calling `solvers.qaoa()` peerforms the entire optimization procedure and returns the minimum energy

In [18]:
parameter_count = solvers.get_num_qaoa_parameters(H,
                                                  num_layers,
                                                  full_parameterization=True,
                                                  counterdiabatic=True)

init_params = np.random.uniform(-np.pi / 8, np.pi / 8, parameter_count)

opt_value, opt_params, opt_config = solvers.qaoa(H,
                                                 num_layers,
                                                 init_params,
                                                 full_parameterization=True,
                                                 counterdiabatic=True)


print()
print('Optimal energy: ', opt_value)
print('Sampled states: ', opt_config)
print('Optimal Configuration: ', opt_config.most_probable())


Optimal energy:  -2.005777257954147
Sampled states:  { 111000:1000 }

Optimal Configuration:  111000


The graph below represents the optimal solution from the QAOA procecure and corresponds to the optimal oritentation between the ligand and protein.

<img src="./images/partition.png" alt="dockin" width="300" />