# Adam Thomson - PHY 573 - Genomic Grover Search
Implementation based on qiskit tutorial for Grover's algorithm: https://learning.quantum.ibm.com/tutorial/grovers-algorithm 

The goal of this project is to demonstrate the ability for Grover's algorithm to perform simple bioinformatic queries. I will do this by first examining why genomic sequence searching is an appropriate application of Grover's algorithm, and a thorough instruction for a trivial example. I will then focus on how to construct the oracle gate for a given genomic query, as well as the full quantum circuit to implement the algorithm. I finish the demonstration by comparing the results from more complex queries when running on a local simulator vs. real hardware. The demonstration will conclude with thoughts about the current limitations of this approach and what the future may hold for the potential field of quantum bioinformatics!

## Setup

In [14]:
# Import libraries
from IPython.display import Math, HTML
import math
import matplotlib

# Imports from Qiskit
from qiskit import QuantumCircuit, transpile
from qiskit.circuit.library import GroverOperator, MCMT, ZGate
from qiskit.visualization import plot_distribution, plot_histogram

# Imports from Qiskit Runtime for running on real hardware
from qiskit_ibm_runtime import QiskitRuntimeService
from qiskit_ibm_runtime import SamplerV2 as Sampler
from qiskit.transpiler.preset_passmanagers import generate_preset_pass_manager

In [4]:
# Import constants
from constants import BASE_MAP, REFERENCE_GENOMES

In [3]:
# Initialize local simulator
from qiskit_aer import AerSimulator

sim_sampler = AerSimulator()

In [10]:
# Utility function required for example
"""
Convert a sequence of genomic base-pairs to the bitstring representation
"""
def _convert_bp_seq_to_bstr(seq):
    bstr = ""
    for bp in seq:
        bstr += BASE_MAP[bp]
    return bstr

## Example Walkthrough

Let's start by examining the steps of the algorithm to see how it performs on the trivial example of a 4-basepair reference sequence and 1-basepair search string with a single match.

In [11]:
# Setup trivial example
trivial_ref = "GCAT"
trivial_search = "A"

# Now convert each to their respective bstrs
trivial_ref_bstr = _convert_bp_seq_to_bstr(trivial_ref)
trivial_search_bstr = _convert_bp_seq_to_bstr(trivial_search)

In [16]:
# We will need 2 qubits to capture all 4 possible indexes for a match
display(Math(r"""
\ket{\psi_0}} = \ket{00}
"""))

<IPython.core.display.Math object>

In [24]:
# Begin by putting all qubits into a superposition with Hadamard gates
display(Math(r"""
\ket{\psi_1} = (\Eta \otimes \Eta)\ket{\psi_0}
\qquad = (\Eta \otimes \Eta)\ket{00}
\qquad = \Eta \ket0 \otimes \Eta \ket0
\qquad = \frac1{\sqrt2}(\ket0 + \ket1) \otimes \frac1{\sqrt2}(\ket0 + \ket1)
\qquad = \frac12(\ket{00} + \ket{01} + \ket{10} + \ket{11}
"""))

<IPython.core.display.Math object>

In [None]:
# Since "A" is in the 3rd position of GCAT, we know our desired state is |10> 
# to account for 0-indexing, thus we "mark" the 2nd qubit because it is 0
display(Math(r"""
\ket{\psi_2} = (\Iota \otimes \Chi)\ket{\psi_1}
\qquad = \frac12(\ket{01} + \ket{00} + \ket{11} + \ket{10})
"""))

In [None]:
# Next, apply the multi-target-multi-control Z gate to flip the phase of the desired state
# This acts as our "oracle" function in the larger algorithm
# TODO: ADD MORE DETAILS ABOUT THE MCMT OPERATOR
display(Math(r"""
\ket{\psi_3} = MCMT\ket{\psi_2}
\ket{\psi_3} = /frac12(\ket{01} + \ket{00} + \ket{11} - \ket{10})
"""))

In [None]:
# Apply another X gate to the 2nd qubit to finish "wrapping" it

### Utility functions

In [6]:
"""
Search the input reference string for instances of the search string(s)
and return indexes for the start of all matches.
!!! Make sure not to return odd indexes !!!
"""
def _find_bstr_index(search_seqs, reference):
    # If only searching for a single sequence, convert to list
    if not isinstance(search_seqs, list):
        search_seqs = [search_seqs]
   
    # Convert the sequences being searched for into bit strings
    search_bstrs = [
        _convert_bp_seq_to_bstr(base_seq)
        for base_seq in search_seqs
    ]
    
    # Convert the reference being searched into a bit string
    ref_bstr = _convert_bp_seq_to_bstr(reference)
    
    # Initialize list of indexes to be returned
    all_marked_indexes = []
    
    # Loop through all searches, and find all matches for each
    for search_bstr in search_bstrs:
        # Reset search start index to beginning of reference
        i = 0
        # Continue finding matches until we reach the end of reference
        while i < len(ref_bstr):
            try:
                n = ref_bstr.index(search_bstr, i)
                # only mark n if it's even! n % 2 = 1 would imply a match that starts mid-basepair
                if n % 2 == 0:
                    # Divide by 2 to convert from bstr index to bp index
                    all_marked_indexes.append(int(n / 2))
                # Next search will only check the reference after the current match
                i = n + 1
            # Will always eventually hit this when no matches exist in the rest of reference
            except ValueError:
                i = len(ref_bstr) + 1
                
    return all_marked_indexes

In [7]:
"""
Build a Grover oracle gate to mark the indexes of input base pair sequences in the reference sequence 

Parameters:
    base_seqs (str or list): Sequence(s) being searched for
    reference str: Reference being searched in

Returns:
    QuantumCircuit: Quantum circuit representing Grover oracle
"""
def sequence_oracle(base_seqs, reference):
    # Compute the number of qubits required for circuit, log_2(N) (always round up)
    num_qubits = math.ceil(math.log(len(reference), 2))

    # Initialize quantum circuit with that many qubits
    qc = QuantumCircuit(num_qubits)
    
    # Search the reference for the desired strings and mark the indexes found
    marked_states = _find_bstr_index(base_seqs, reference)
    # Print it out for manual verification
    print(f"Marked the indexes: {marked_states}")
    
    # Mark each target state in the input list
    for target_idx in marked_states:
        if target_idx > 0 and math.log(target_idx, 2) > num_qubits:
            raise ValueError('Marked an index beyond range of reference, try again')
        
        # Convert the target index into binary and strip leading '0b' identifier
        target = bin(target_idx)[2:]
        
        # Pad the binary index with enough 0s to match num_qubits
        pad_target = "0"*(num_qubits-len(target)) + target
        
        # Flip target bit-string to match Qiskit bit-ordering
        rev_target = pad_target[::-1]
        
        # Find the indices of all the '0' elements in bit-string
        zero_inds = [ind for ind in range(num_qubits) if rev_target.startswith("0", ind)]
        
        # Add a multi-controlled Z-gate wrapped in X-gates where the target bit-string has a '0' entry
        qc.x(zero_inds)
        qc.compose(MCMT(ZGate(), num_qubits - 1, 1), inplace=True)
        qc.x(zero_inds)

    return qc, len(marked_states)