# Quantum Annealer for Philogenetic Trees

---

In this notebook, we are going to use D-Wave's Ocean to create the needed optimization problem to reconstruct Philogenetic Trees. In short, this is where the real work begins. This work will be based in 2 documents, a book that describes Quantum Annealing [1], and the paper for reconstruction of Philogenetic Trees [2].



[1] Combarro, E. F., & Gonzalez-Castillo, S. (2023). A practical guide to quantum machine learning and quantum optimisation: Hands-On Approach to Modern Quantum Algorithms. Packt Publishing.

[2] Onodera, W., Hara, N., Aoki, S., Asahi, T., & Sawamura, N. (2022). Phylogenetic tree reconstruction via graph cut presented using a quantum-inspired computer. Molecular Phylogenetics and Evolution, 178, 107636. https://doi.org/10.1016/j.ympev.2022.107636

In [120]:
import numpy as np
import dimod
from dimod import BinaryQuadraticModel, BINARY
from typing import Optional
from dwave.system import DWaveSampler, EmbeddingComposite
from colorama import Fore

First, we start with an example:

In [46]:
# Coefficients of the quadratic term elements (squared or products)
J = {(0,1):1, (0,2):1}
# Coefficients of the linear terms
h = {}
problem = BinaryQuadraticModel(h, J, 0.0, BINARY)
print("The problem we are going to solve is:")
print(problem)

The problem we are going to solve is:
BinaryQuadraticModel({0: 0.0, 1: 0.0, 2: 0.0}, {(1, 0): 1.0, (2, 0): 1.0}, 0.0, 'BINARY')


From the paper we have that the minimization is defined as:

$$Min_{cut}=\sum_{i=1}^{n-1}\sum_{j=i+1}^n d_{ij}(x_i-x_j)^2,\qquad x_i=\{0,1\}, \quad i = 1,...,n.$$

Where $d_{ij}$ is the element $ij$ from the matrix $D$, where the differences between elements are represented. In other words, if you consider the problem as a graph, $D$ is the adjacency matrix from the graph.

If we take a closer look at this formula, we can see that we take the top part of the matrix, also, we start counting from 1, and I don't want that, so we can rewrite the expression as:

$$Min_{cut}=\sum_{i=0}^{n-2}\sum_{j=i+1}^{n-1} d_{ij}(x_i-x_j)^2,\qquad x_i=\{0,1\}, \quad i = 0,...,n-1.$$

To start, as I don't have any data to create adjacency matrices, I can create a random all-to-all graph with the characteristics defined in the paper. The graph would be the next:

<div style="text-align: center;">
    <img src="images/randgraph_1.png" alt="Complete Graph" width="600px">
</div>

Defined by the following adjacency matrix:

$$
\begin{pmatrix}
    0  & 92 & 73 & 78 & 92 \\
    92 & 0  & 21 & 49 & 34 \\
    73 & 21 & 0  & 35 & 63 \\
    78 & 49 & 35 & 0  & 29 \\
    92 & 34 & 63 & 29 & 0 \\
\end{pmatrix}
$$

Firstly, we can create a function that creates ``BinaryQuadraticModel`` objects from a given matrix for our problem. This will facilitate the process hereafter.

In [73]:
w = {}
if (0,0) in w:
    w[(0,0)]+=1
else:
    w[(0,0)]=1
    
w

{(0, 0): 1}

In [101]:
# Function to create BinaryQuadraticModel from a numpy matrix
def create_problem (matrix:np.ndarray)->BinaryQuadraticModel:
    r"""
    Creates a BinaryQuadraticModel from a numpy matrix using the Min-cut formulation. Both simmetrical matrices and matrices with 0 above the main diagonal work.
    
    Args:
        `matrix`: Matrix that defines the problem.
    
    Returns:
        The BinaryQuadraticModel from dimod that defines the problem.
    """
    
    # No linear terms
    h = {}
    J = {}
    rows,cols = matrix.shape
    
    if rows!=cols:
        raise ValueError('The matrix MUST be symmetric')
    
    # Quadratic terms
    for i in range(rows):
        for j in range(i):
            coef = matrix[i,j]
            # First term
            if (i,i) in J:
                J[(i,i)]+=coef
            else:
                J[(i,i)]=coef
            
            # Double term
            J[(i,j)]=-2*coef
            
            # Second term
            if (j,j) in J:
                J[(j,j)]+=coef
            else:
                J[(j,j)]=coef
    print(J)
    problem = BinaryQuadraticModel(h, J, dimod.BINARY)
    return problem

In [102]:
matrix = np.array([[0,0,0,0,0],[92,0,0,0,0],[73,21,0,0,0],[78,49,35,0,0],[92,34,63,29,0]])

problem = create_problem(matrix)
print(problem)

{(1, 1): np.int64(196), (1, 0): np.int64(-184), (0, 0): np.int64(335), (2, 2): np.int64(192), (2, 0): np.int64(-146), (2, 1): np.int64(-42), (3, 3): np.int64(191), (3, 0): np.int64(-156), (3, 1): np.int64(-98), (3, 2): np.int64(-70), (4, 4): np.int64(218), (4, 0): np.int64(-184), (4, 1): np.int64(-68), (4, 2): np.int64(-126), (4, 3): np.int64(-58)}
BinaryQuadraticModel({1: 196.0, 0: 335.0, 2: 192.0, 3: 191.0, 4: 218.0}, {(0, 1): -184.0, (2, 1): -42.0, (2, 0): -146.0, (3, 1): -98.0, (3, 0): -156.0, (3, 2): -70.0, (4, 1): -68.0, (4, 0): -184.0, (4, 2): -126.0, (4, 3): -58.0}, 0.0, 'BINARY')


It's important to know that the next cell access the D-Wave Quantum Annealer. That's why there's commented lines.

In [None]:
# You need to have an access token configured
# sampler = EmbeddingComposite(DWaveSampler())
# result = sampler.sample(problem, num_reads=10)
print("The solutions that we have obtained are")
print(result)

The solutions that we have obtained are
   0  1  2  3  4 energy num_oc. chain_.
0  0  0  0  0  0    0.0       5     0.0
1  1  1  1  1  1    0.0       5     0.0
['BINARY', 2 rows, 10 samples, 5 variables]


In [55]:
print(f'Time required to complete: {result.info['timing']['qpu_access_time']}ms')

Time required to complete: 16541.56ms


In [145]:
x0 = dimod.Binary('x0')
x1 = dimod.Binary('x1')
x2 = dimod.Binary('x2')

blp = dimod.ConstrainedQuadraticModel()

blp.set_objective(3*(x0-x1)**2+3*(x1-x2)**2+(x0-x2)**2+(x0+x1+x2-1)**2)
blp.add_constraint(x0+x1+x2 == 1 )
# blp.add_constraint(x0+x1+x2 <= 2 )

print(blp)

solver = dimod.ExactCQMSolver()

sol = solver.sample_cqm(blp)

print('Solutions')
print(sol,'\n')

# We want the best feasible solution. We can filter by its feasibility and take the first element
feas_sol = sol.filter(lambda s: s.is_feasible)
print(Fore.RED+'Best Solution'+Fore.RESET)
print(f'Variables: {feas_sol.first.sample}, Cost = {feas_sol.first.energy}')

Constrained quadratic model: 3 variables, 1 constraints, 9 biases

Objective
  1 + 3*Binary('x0') + 5*Binary('x1') + 3*Binary('x2') - 4*Binary('x0')*Binary('x1') - 4*Binary('x1')*Binary('x2')

Constraints
  cca69ae: Binary('x0') + Binary('x1') + Binary('x2') == 1.0

Bounds

Solutions
  x0 x1 x2 energy num_oc. is_sat. is_fea.
0  0  0  0    1.0       1 arra... np.F...
2  1  0  0    4.0       1 arra... np.T...
4  0  0  1    4.0       1 arra... np.T...
7  1  1  1    4.0       1 arra... np.F...
3  1  1  0    5.0       1 arra... np.F...
5  0  1  1    5.0       1 arra... np.F...
1  0  1  0    6.0       1 arra... np.T...
6  1  0  1    7.0       1 arra... np.F...
['INTEGER', 8 rows, 8 samples, 3 variables] 

[31mBest Solution[39m
Variables: {'x0': np.int64(1), 'x1': np.int64(0), 'x2': np.int64(0)}, Cost = 4.0


In [144]:
qubo = dimod.cqm_to_bqm(blp,lagrange_multiplier=5)
print(qubo)

(BinaryQuadraticModel({'x0': 3.0, 'x1': 5.0, 'x2': 3.0}, {('x1', 'x0'): -4.0, ('x2', 'x0'): 0.0, ('x2', 'x1'): -4.0}, 1.0, 'BINARY'), <dimod.constrained.constrained.CQMToBQMInverter object at 0x000002089A377640>)


Para mañana TODO list:

- [ ] Hacer el modelo de Mincut con variables binarias
- [ ] Hacer las comprobaciones del funcionamiento de las constraints
- [ ] Probar a hacerlo sin las constraints en el modelo Mincut
- [ ] Ejecutar cosas
