# BlockEncoding class 201: Qubitization, Chebyshev polynomials, QSP

Welcome back to the tutorial on Quantum Linear Algebra. In the previous tutorial, we learned how to "embed" a non-unitary matrix $A$ in the top left block a larger unitary $U$ using the [BlockEncoding class](../../../reference/Block%20Encodings/BlockEncoding.rst). But just having $A$ block encoded usually isn't enough. 

Often, we want to compute functions of that matrix; like $e^{-iAt}$ for simulating physics (Hamiltonians), or $A^{-1}$ for solving linear systems, or even applying step-functions or the gaussian as a filter for ground-state preparation. 

Classically, if you have a number $x$, you can calculate $f(x)$ using a Taylor series or a polynomial approximation. In quantum computing, we can actually do the same thing by using an approach called [Quantum Signal Processing (QSP)](https://dspace.mit.edu/handle/1721.1/115025). This will be covered towards the end of this tutorial. To get there, we first need to learn how to use the BlockEncoding class to transform into a "Quantum Walk" via Qubitization.

After learning about Qubitization, this tutorial will also explain how to block-encode [Chebyshev polynomials](https://en.wikipedia.org/wiki/Chebyshev_polynomials) and why they are useful. Namely, we'll cover the [quantum Lanczos method](https://quantum-journal.org/papers/q-2023-05-23-1018/pdf/), a technique that uses the Krylov subspace (built from Chebyshev polynomials) to estimate the ground state energy of a Hamiltonian without needing complex time-evolution. We will also cover the [Childs-Kothari-Somma (CKS) algorithm](https://arxiv.org/pdf/1511.02306), which is a near-optimal linear system solver that uses block encoded Chebyshev polynomials in another layer of LCU, this time with a neat (spoiler alert: unary encoding) trick.

We will then show how to run the algorithms stemming from [generalized QSP](https://arxiv.org/pdf/2308.01501) in Qrisp, and how the [BlockEncoding class](../../../reference/Block%20Encodings/BlockEncoding.rst) makes polynomial transformations, solving linear systems, and hamiltonian simulation as simple as calling [.poly(coeffs)](../../../reference/Block%20Encodings/methods/poly.rst), [.inv(eps, kappa)](../../../reference/Block%20Encodings/methods/inv.rst), and [.sim(t, N)](../../../reference/Block%20Encodings/methods/sim.rst) methods respectively.

Here's a summary:

| Method | Purpose | Mathematical Basis |
| :--- | :--- | :--- |
| [.qubitization()](../../../reference/Block%20Encodings/methods/qubitization.rst) | Transforms $A$ into a "walk operator" $W$ | Interleaved [reflection](../../../reference/Primitives/reflection.rst) + [q_switch](../../../reference/Primitives/qswitch.rst) |
| [.chebyshev(k)](../../../reference/Block%20Encodings/methods/chebyshev.rst) | Computes the $k$-th Chebyshev polynomial $T_k$ | Iterative application of $W^k$ ([.qubitization](../../../reference/Block%20Encodings/methods/qubitization.rst)) |
| [.poly(coeffs)](../../../reference/Block%20Encodings/methods/poly.rst) | Applies an arbitrary polynomial transformation $P(A)$ | [GQSP]((https://arxiv.org/pdf/2308.01501)) |
| [.inv(eps, kappa)](../../../reference/Block%20Encodings/methods/inv.rst) | Solves the [linear system](https://arxiv.org/pdf/2411.02522) $Ax = b$ | $1/x$ polynomial approximation |
| [.sim(t, N)](../../../reference/Block%20Encodings/methods/sim.rst) | Simulates Hamiltonian evolution $e^{-iHt}$ | Jacobi-Anger expansion (Bessel functions) |

But first thing's first. Let's break down the concept called qubitization.

## Qubitization
If a Block Encoding is a "static snapshot" of a matrix $A$, Qubitization is what makes it "move". Technically, Qubitization is a method to transform an $(\alpha, m, \epsilon)$-block-encoding of a matrix $A$ into a special unitary operator $W$, often referred to as the "walk operator". This operator has a nice property: it maps the eigenvalues $\lambda$ of $A$ to the eigenvalues $e^{\pm i \arccos(\lambda/\alpha)}$ in a set of two-dimensional invariant subspaces. 

Given a Hermitian matrix $H$ and its block-encoding $(U, G)$, where $G\ket{0} = \ket{G}$, we use the definition of the reflection operator $R$ acting on the ancilla space as $R = (2\ket{G}\bra{G}_a \otimes \mathbb{1}_a)\otimes \mathbb{1}_{s}$ from [Lemma 1 in Exact and efficient Lanczos method on a quantum computer](https://arxiv.org/pdf/2208.00567). To "qubitize" the encoding, we interleave the SELECT operator with this reflection. The Qubitized Walk Operator $W$ is defined as $W = \text{SELECT}\cdot R$.

Rigorous analysis (see [Lin Lin, Chapter 8](https://arxiv.org/pdf/2201.08309)) shows that if $\ket{\psi_\lambda}$ is an eigenvector of $H/\alpha$ with eigenvalue $\lambda \in [-1, 1]$, the operator $W$ acts on a 2D subspace spanned by $\ket{G}\ket{\psi_\lambda}$ and its orthogonal complement $\ket{\perp}$. Within this subspace, the eigenvalues of $W$ are $\mu_\pm = \lambda \pm i\sqrt{1-\lambda^2} = e^{\pm i \arccos(\lambda)}$. Essentially, Qubitization "lifts" the eigenvalues of our matrix onto the unit circle in the complex plane, allowing us to manipulate them using phase rotations.

This seems like a mouthful, but in order for this tutorial to cater to both developers coming from the classical domain, as well as researchers in quantum computing, it's the "necessary evil". To make it up to you, we're going to show how in Qrisp, you don't need to build these reflections manually. But first, some visual aid so that you see that it's not as complex as it sounds.

![Alt text](qubitization.png)

In the previous tutorial you've already learned how we use the $\text{SELECT}$ by just calling [qswitch](../../../reference/Primitives/qswitch.rst). Well, what if we told you that performing the reflection operation above you can just use the [reflection](../../../reference/Primitives/reflection.rst) function? Yup, that's the cool thing about modular software development approach Qrisp is taking with its focus on high-level abstractions. Let's get to coding!

### Qubitization in Qrisp as [.qubitization()](../../../reference/Block%20Encodings/methods/qubitization.rst)

While understanding the internal mechanics of q_switch and reflection is valuable for intuition, Qrisp abstracts this complexity away for standard operations. The [BlockEncoding class](../../../reference/Block%20Encodings/BlockEncoding.rst) features a dedicated method, [.qubitization()](../../../reference/Block%20Encodings/methods/qubitization.rst), which automatically constructs the walk operator $W$ from your input matrix. This method handles the heavy lifting: it identifies the necessary reflection operators $R$ and interleaves them with the signal oracle (the block-encoding unitary $U$). If the original block-encoding unitary $U$ isn't Hermitian (i.e., $U^2 \neq \mathbb{1}$), Qrisp automatically handles the Hermitian embedding; often requiring one additional ancilla qubit. To ensure the walk operator remains unitary.

Here is how you can transform a Hamiltonian into its qubitized walk operator in just a few lines:

In [None]:
from qrisp.block_encodings import BlockEncoding
from qrisp.operators import X, Y, Z

# 1. Define a Hamiltonian
# We create a simple Hamiltonian H = X_0 Y_1 + 0.5 Z_0 X_1
H = X(0)*Y(1) + 0.5*Z(0)*X(1)

# 2. Create the initial Block Encoding
# This generates the (U, G) pair discussed above
BE = BlockEncoding.from_operator(H)

# 3. Generate the Qubitized Walk Operator
# This automates the construction of W = SELECT * R
BE_walk = BE.qubitization()

The resulting object, ``BE_walk``, is a new [BlockEncoding](../../../reference/Block%20Encodings/BlockEncoding.rst) instance representing the walk operator $W$. A thing to remember from this example is the fact that when you invoke methods of the [BlockEncoding](../../../reference/Block%20Encodings/BlockEncoding.rst) class like [.qubitization](../../../reference/Block%20Encodings/methods/qubitization.rst), or later [.poly](../../../reference/Block%20Encodings/methods/poly.rst) and [.sim](../../../reference/Block%20Encodings/methods/sim.rst), Qrisp qubitizes your BlockEncoding object under the hood, handling the ancilla management and reflection logic for you, abstracting away the need to know how to implement these methods as (trigger warning) circuits.

Crucially, the [.qubitization](../../../reference/Block%20Encodings/methods/qubitization.rst) operator is also how one encodes the Chebyshev polynomials of the Hamiltonian, as we'll learn in the next part of the tutorial. 

## Block encoding Chebyshev polynomials
One of the most powerful features of Qubitization is its natural relationship with Chebyshev polynomials of the first kind, $T_k(x)$, defined as $T_k(\cos \theta) = \cos(k\theta)$. If we apply the walk operator $W$ $k$-times, the resulting unitary $W^k$ contains $T_k(A/\alpha)$ block encoded in the top-left block:
$$(\bra{G} \otimes \mathbb{1}) W^k (\ket{G} \otimes \mathbb{1}) = T_k(A/\alpha).$$

"But what's so special about Chebyshev polynomials", you might be wondering. As noted in Lin Lin’s lecture notes, Chebyshev polynomials are "optimal" in two senses:

- Iterative Efficiency: Because $W$ is a single unitary, applying $W^k$ requires only $k$ queries to the block encoding. This is much cheaper than the $O(2^n)$ terms often required by naive Taylor series expansions.

- Approximation Theory: According to the Chebyshev Equioscillation Theorem, $T_k(x)$ provides the best uniform approximation to a function over the interval $[-1, 1]$. This ensures that our quantum algorithm achieves the desired precision $\epsilon$ with the minimum possible quantum resources.

we think that, again, some visual aid is needed. By applying the [.qubitization](../../../reference/Block%20Encodings/methods/qubitization.rst) operator $k$ times, we block encode the $k$-th Chebyshev polynomial of the first kind $T_k$. If you apply $W^k=(RU)^k$ $k$ times, you get $T_k$ block encoded. Do it once, $k=1$, you get the top left figure. Do it twice ($k=2$), you block-encode $T_2$ (top right figure). Do it $k=5$ times... yup, you guessed it (bottom right figure):

![Alt text](cheb_poly.png)

Just as with the basic walk operator, Qrisp abstracts the iterative application of $W$ into a simple method call. The [BlockEncoding class](../../../reference/Block%20Encodings/BlockEncoding.rst) provides a [.chebyshev(k)](../../../reference/Block%20Encodings/methods/chebyshev.rst) method, which returns a new block encoding for the $k$-th Chebyshev polynomial $T_k$. This handles the construction of $W^k$ (or the appropriate sequence of reflections and select/[q_switch](../../../reference/Primitives/qswitch.rst) operators) internally.

Here is how to generate and apply a Chebyshev polynomial transformation to a Hamiltonian:

In [None]:
from qrisp import *
from qrisp.block_encodings import BlockEncoding
from qrisp.operators import X, Y, Z

# 1. Define the Hamiltonian
H = X(0)*X(1) + 0.5*Z(0)*Z(1)

# 2. Create the initial Block Encoding
BE = BlockEncoding.from_operator(H)

# 3. Create the Block Encoding for T_2(H)
# This generates the circuit for the 2nd order Chebyshev polynomial
BE_cheb_2 = BE.chebyshev(k=2)

# 4. Use the new Block Encoding
# For example, applying it via Repeat-Until-Success (RUS) to a state
def operand_prep():
    # Prepare an initial state, e.g., uniform superposition on 2 qubits
    qv = QuantumFloat(2)
    h(qv)
    return qv

@terminal_sampling
def main(BE):
    # Apply the block encoded operator T_2(H) to the state
    qv = BE.apply_rus(operand_prep)()
    return qv

# Execute
result = main(BE_cheb_2)
print(result)

{0.0: 0.4900000153295707, 3.0: 0.48999992592259706, 1.0: 0.010000029373916136, 2.0: 0.010000029373916136}


The [.chebyshev(k)](../../../reference/Block%20Encodings/methods/chebyshev.rst) method is particularly useful for building polynomial approximations where $T_k$ terms are the basis functions. By default (``rescale=True``), it returns a block-encoding of $T_k(H)$, managing the normalization factors via Quantum Eigenvalue Transformation (QET) logic. If you need the raw polynomial $T_k(H/\alpha)$ relative to the block-encoding's normalization $\alpha$, you can set ``rescale=False``.

As learned in the previous tutorial you can also perform resource analysis by just calling the [.resources()](../../../reference/Block%20Encodings/methods/resources.rst) method.

In [None]:
qv = operand_prep()

cheb_resources = BE_cheb_2.resources(qv)
print(cheb_resources)

{'gate counts': {'rx': 2, 'p': 2, 'x': 8, 'u3': 4, 'gphase': 4, 'cx': 12, 'cz': 6}, 'depth': 33}


Let's, at this point, also show how we could do a quick benchmark of the scaling of resources needed. Since the walk operatore $W=(RU)$ is exactly the block encoding overhead for Chebyshev polynomials, we can see how the resources scale with repeated applications of them to block-encode $T_k$.

In [None]:
for k in range(1, 8):
    qv = operand_prep()
    # Generate the k-th Chebyshev Block Encoding
    # We use rescale=False to look at the raw complexity of the walk operator iterations
    BE_cheb = BE.chebyshev(k, rescale=False)
    
    # Extract resource dictionary
    cheb_resources = BE_cheb.resources(qv)
    print(f"k = {k}: {cheb_resources}")

k = 1: {'gate counts': {'z': 1, 'x': 5, 'u3': 2, 'gphase': 3, 'cx': 4, 'cz': 2}, 'depth': 16}
k = 2: {'gate counts': {'z': 2, 'x': 10, 'u3': 4, 'gphase': 6, 'cx': 8, 'cz': 4}, 'depth': 32}
k = 3: {'gate counts': {'z': 3, 'x': 15, 'u3': 6, 'gphase': 9, 'cx': 12, 'cz': 6}, 'depth': 48}
k = 4: {'gate counts': {'z': 4, 'x': 20, 'u3': 8, 'gphase': 12, 'cx': 16, 'cz': 8}, 'depth': 64}
k = 5: {'gate counts': {'z': 5, 'x': 25, 'u3': 10, 'gphase': 15, 'cx': 20, 'cz': 10}, 'depth': 80}
k = 6: {'gate counts': {'z': 6, 'x': 30, 'u3': 12, 'gphase': 18, 'cx': 24, 'cz': 12}, 'depth': 96}
k = 7: {'gate counts': {'z': 7, 'x': 35, 'u3': 14, 'gphase': 21, 'cx': 28, 'cz': 14}, 'depth': 112}


If you look at the printed output of your benchmark, you’ll notice a very satisfying trend: the gate counts and depth grow linearly with $k$.

In the classical world, high-order polynomial approximations often come with a heavy computational tax. On a quantum computer, thanks to Qubitization, the $k$-th Chebyshev polynomial $T_k$ is implemented simply by repeating the walk operator $W$ exactly $k$ times. This efficiency is the "secret sauce" behind many modern quantum algorithms. We, therefore, get high-precision approximations without the exponential gate-count explosion.

## Quantum Lanczos method

The ability to efficiently block-encode Chebyshev polynomials isn't just a mathematical flex; it’s a prerequisite for one of the most exciting algorithms in recent years: the Quantum Lanczos Method.

As detailed in the paper [Exact and efficient Lanczos method on a quantum computer](https://arxiv.org/pdf/2208.00567), these polynomials are used to construct what is known as a Krylov subspace. By applying different orders of $T_k$ to an initial state, we can "scan" the spectrum of a Hamiltonian without performing expensive real or imaginary time evolution.

To put it in a more digestible way, the Lanczos method projects the Hamiltonian into a much smaller, manageable subspace. We construct a projected Hamiltonian matrix $\mathbf{H}$ and an overlap matrix $\mathbf{S}$ within this subspace. The matrix elements are defined as:
$$S_{ij} = \bra{\psi_0} T_i(H) T_j(H) \ket{\psi_0} = \bra{ketT_i(H)T_j(H)}_0$$
$$H_{ij} = \bra{\psi_0} T_i(H) H T_j(H) \ket{\psi_0} = \braket{T_i(H)HT_j(H)}_0$$

Using the product identities of Chebyshev polynomials, these can be broken down into linear combinations of simple expectation values $\braket{T_k(H)}_0$.

If you're wondering why you should care, this allows for highly accurate Ground State Preparation! By solving the generalized eigenvalue problem $\mathbf{H}\vec{v}=\epsilon \mathbf{S}\vec{v}$ classically, we can find the lowest eigenvalue (the ground state energy) and the corresponding state with far fewer resources than traditional Phase Estimation.

In Qrisp, the lanczos_alg function automates this entire pipeline: measuring the expectation values via Qubitization, building the matrices, regularizing them to handle noise, and solving the eigenvalue problem.

Here is an example estimating the ground state energy of a 1D Heisenberg model. We use a tensor product of singlets as our initial guess $\ket{\psi_0}$ to ensure a good overlap with the true ground state.

In [None]:
from qrisp import QuantumVariable
from qrisp.algorithms.lanczos import lanczos_alg
from qrisp.operators import X, Y, Z
from qrisp.vqe.problems.heisenberg import create_heisenberg_init_function
from qrisp.jasp import jaspify
import networkx as nx

# 1. Define a 1D Heisenberg model on 6 qubits
L = 6
G = nx.cycle_graph(L)
# H = sum (X_i X_j + Y_i Y_j + 0.5 Z_i Z_j)
H = (1/4)*sum((X(i)*X(j) + Y(i)*Y(j) + 0.5*Z(i)*Z(j)) for i,j in G.edges())

# 2. Prepare initial state function (tensor product of singlets)
# This acts as our |psi_0>
M = nx.maximal_matching(G)
U_singlet = create_heisenberg_init_function(M)

def operand_prep():
    qv = QuantumVariable(H.find_minimal_qubit_amount())
    U_singlet(qv)
    return qv

# 3. Run the Quantum Lanczos Algorithm
# D is the Krylov space dimension
D = 6  

# We use jaspify for high-performance JIT compilation/tracing
@jaspify(terminal_sampling=True)
def main():
    # lanczos_alg handles the block encoding, measurements, and diagonalization
    return lanczos_alg(H, D, operand_prep, show_info=True)

energy, info = main()
print(f"Ground state energy estimate: {energy}")

# 4. Compare with exact classical diagonalization
print(f"Exact Ground state energy: {H.ground_state_energy()}")

Ground state energy estimate: -2.361955976722853                                     [2K
Exact Ground state energy: -2.368033988749906


## Childs-Kothari-Somma algorithm

Before we jump into arbitrary polynomials, there is one specific application of Chebyshev polynomials combined with Linear Combination of Unitaries (LCU) that deserves its own spotlight: The Childs-Kothari-Somma (CKS) algorithm.

The CKS algorithm solves the Quantum Linear System Problem (QLSP) $A\vec{x} = \vec{b}$. It does this by approximating the function $1/x$ over the domain $D_{\kappa} = [-1, -1/\kappa] \cup [1/\kappa, 1]$ using a linear combination of Chebyshev polynomials. Visually, this is represented by the domains highlighted in light blue in the following figure:

![Alt text](inverse.png)

The algorithm constructs a block-encoding of the inverse operator:
$$A^{-1} \propto \sum_{j=0}^{j_0} \alpha_{2j+1} T_{2j+1}(A),$$
where $\alpha_j$ are carefully calculated coefficients. The more the terms, the better the approximation, as seen above, where the truncation order $j_0$ is mentioned next to the corresponding precision $\epsilon$.

For the visual learners, here is the circuit schematic for this underlying implementation (you can simply call, as seen in the example below, with the [CKS](../../../reference/Algorithms/CKS.rst) function).

![Alt text](BE_CKS.png)

The unary encoding trick is a neat one, that's for sure. It constructs a unary state (instead of the usual $\text{PREP}$ we use in usual LCU), which allows only for one control instead of the usual [qswitch](../../../reference/Primitives/qswitch.rst) one.

Here is the state in maths:
$$\ket{\text{unary}} \propto

        \sqrt{\alpha_1}\ket{100\dots00} + (-1) \sqrt{\alpha_3}\ket{110\dots00} + (-1)^2 \sqrt{\alpha_5}\ket{111\dots00} +

        \cdots + \sqrt{\alpha_3}\ket{110\dots00} +(-1)^{j_0}\sqrt{\alpha_{2j_0+1}}\ket{111\dots11}.$$

The negative sign we handle by applying $Z$ gates to the outer ancillary variable. When such superposition is prepared, the first state, $\ket{100\dots00}$, corresponding to $\alpha_1$, "activates" only the first unitary, ergo $T_1(A)$ (top left in figure below). The second unary part of the unary superposition, "\ket{110\dots00}", corresponding to $\alpha_3$ activates the $T_3(A)$ (top right in figure below). Same thing for $\ket{111\dots00}$ (bottom left in figure below), and so on until $k=2j_0+1$ (bottom right in figure below), where $j_0$ is the truncation order from the initial CKS paper,where all of the $(RU)^k$ "light up". ATime for the "figure below" promised above:

![Alt text](unary_magic.png)

Since the first $(RU)$ is always triggered, we can basically remove the control on that one. The Qrisp implementation does just that.

"But wait!", you might think. "I know another, more famous algorithm for solving linear systems under the name HHL. Why use CKS over that algorithm?"

 Precision. The complexity of CKS scales as $\mathcal{O}(\text{polylog}(1/\epsilon))$, representing an exponentially better precision scaling compared to HHL. This makes it feasible to get high-accuracy solutions without an explosion in circuit depth. As we will see later, utilizing (foreshadowing again here) QSP/GQSP. You'll learn about this later, as well as how to use this, even more efficient approach, by simply calling the [.inv(eps, kappa)](../../../reference/Block%20Encodings/methods/inv.rst) method. How sick, right?!

 Oh, and $\epsilon$ here is (as already mentioned) the precision, with $\kappa$ being the condition number. The higher the $\kappa$, the more difficult is to solve the linear system. The higher the $\epsilon$, the more terms in the Chebyshev approximation, resulting in costing more resources.

Time for an example. In Qrisp, the CKS function takes a Block Encoding of matrix $A$ and returns a new Block Encoding representing $A^{-1}$. You can then apply this inverted operator to your state $\ket{b}$ using the Repeat-Until-Success (RUS) protocol. Here is how to solving a 4x4 Hermitian system:

In [None]:
import numpy as np
from qrisp import prepare, QuantumFloat
from qrisp.algorithms.cks import CKS
from qrisp.block_encodings import BlockEncoding
from qrisp.jasp import terminal_sampling

# 1. Define the linear system Ax = b
A = np.array([[0.73255474, 0.14516978, -0.14510851, -0.0391581],
              [0.14516978, 0.68701415, -0.04929867, -0.00999921],
              [-0.14510851, -0.04929867, 0.76587818, -0.03420339],
              [-0.0391581, -0.00999921, -0.03420339, 0.58862043]])

b = np.array([0, 1, 1, 1])

# Calculate condition number (needed for the polynomial approximation domain)
kappa = np.linalg.cond(A)

# 2. Define state preparation for vector |b>
def prep_b():
    operand = QuantumFloat(2)
    prepare(operand, b)
    return operand

@terminal_sampling
def main():
    # Convert matrix A to a Block Encoding
    BA = BlockEncoding.from_array(A)
    
    # Create the Block Encoding for A^-1 using CKS
    # This internally calculates coefficients and builds the LCU circuit
    BA_inv = CKS(BA, eps=0.01, kappa=kappa)
    
    # Apply A^-1 to |b> using Repeat-Until-Success
    x = BA_inv.apply_rus(prep_b)()
    return x

# 3. Execute and compare results
res_dict = main()

# Extract amplitudes
amps = np.sqrt([res_dict.get(i, 0) for i in range(len(b))])
print("QUANTUM SIMULATION\n", amps)

# Calculate classical solution for verification
c = (np.linalg.inv(A) @ b) / np.linalg.norm(np.linalg.inv(A) @ b)
print("CLASSICAL SOLUTION\n", c)

QUANTUM SIMULATION                                                                   [2K
 [0.02711477 0.55709846 0.53035149 0.63846174]
CLASSICAL SOLUTION
 [0.02944539 0.55423278 0.53013239 0.64102936]


Feel free to experiment with the precision and use [.resources](../../../reference/Block%20Encodings/methods/resources.rst) to see how it impacts gate counts and depth, which is a direct consequence of approximating the inverse with a higher-degree Chebyshev polynomial $T_k$. We also encourage you to try the 'custom block encoding' for the Laplacian operator from the previous tutorial; as noted in BlockEncoding 101, the [.from_lcu](../../../reference/Block%20Encodings/methods/from_lcu.rst) approach is significantly more efficient. In fact, we really urge you to try it now! The Laplacian is a genuinely relevant sparse matrix where quantum speedups can be really "felt" since it's used in applications ranging from fluid dynamics to Quantum Support Vector Machines in quantum machine learning (proving there's substance behind the hype, after all!).

While Chebyshev polynomials are the "optimal" choice for many tasks, they are still just one type of polynomial. What if you want to implement a step function to filter states? Or an inverse function $1/x$ for solving linear systems of equations? Or a complex exponential $e^{-ixt}$ for Hamiltonian simulation?

To do that, we need a more generalized framework that treats the walk operator not just as a repeating block, but as a tunable sequence. This brings us to the "Grand Unified Theory" of quantum algorithms: Quantum Signal Processing (QSP).

In the next section (after learning about solving linear systems with the Childs-Kothari-Somma algorithm and how to do it as [.inv(eps, kappa)](../../../reference/Block%20Encodings/methods/inv.rst)), we’ll see how Qrisp takes everything we've learned about block encodings and qubitization to let you implement an arbitrary polynomial transformation by simply calling .poly().

## Quantum Signal Processing (QSP)

While LCU and Qubitization allows us to block-encode operators and Chebyshev polynomials $T_k$;the latter by simply repeating a walk operator $W=RU$, Quantum Signal Processing (QSP) provides a way to implement arbitrary polynomial transformations $P(A)$.

At its core, QSP manipulates a single-qubit "signal" using a sequence of rotations. If we have a signal operator $W(x)$ that encodes some value $x \in [-1, 1]$, and we interleave it with a series of phase shifts $e^{i\phi_j Z}$, the resulting product of unitaries can be written as:
$$U_\Phi(x) = e^{i\phi_0 Z} \prod_{j=1}^d W(x) e^{i\phi_j Z}$$

Through a clever choice of the phase angles $\{\phi_0, \phi_1, \dots, \phi_d\}$, the block-encoding (top left block) of this unitary becomes a polynomial $P(x)$. 

The Fundamental Theorem of QSP states that there exists a set of phase angles $\{\phi_0, \dots, \phi_d\}$ such that the top-left block of $U_\Phi$ corresponds to a polynomial $P(A/\alpha)$ where:

- $\text{deg}(P) \leq d$

- $P$ has parity $d \pmod 2$ (it is either purely even or purely odd), and

- $|P(x)| \leq 1$ for all $x \in [-1, 1]$

I know, I know, this was quite a lot of theory, but as usually, we're here to make things simple with Qrisp. We have made it possible for you to not even worry worry about the classical math of finding these angles! Obtaining them involves some heavy Laurent series and optimization already included in Qrisp as an an internal "angle solver" that handles this "classical nightmare" for you. You can therefore treat these complex mathematical transformations as simple method calls!

As a final point of emphasis here, the main advantage of QSP lies in its optimality: it can approximate any continuous function to within error $\epsilon$ using a circuit depth that scales nearly linearly with the complexity of the function, meeting the theoretical lower bounds for quantum query complexity.

### Quantum Eigenvalue and Singular Value Transformation

Building on our discussion of Qubitization and LCU, we can now dive into the [Grand Unification of quantum algorithms](https://arxiv.org/pdf/2105.02859): [Quantum Singular Value Transformation (QSVT)](https://dl.acm.org/doi/epdf/10.1145/3313276.3316366). In the context of [Lin Lin’s lecture notes](https://arxiv.org/pdf/2201.08309), these methods represent the most efficient way to process matrices on a quantum computer by treating a matrix as a [BlockEncoding](../../../reference/Block%20Encodings/BlockEncoding.rst).

QSVT applies a polynomial $P$ to the singular values of a matrix $A$ without needing to perform a full Singular Value Decomposition (SVD). 

Disclaimer, a bit more maths before showing examples of how to perform this simply and intuitevely as methods of the class we've been covering.

Consider a matrix $A \in \mathbb{C}^{m \times n}$ with $\|A\| \leq 1$ with SVD of $A = \sum_{i} \sigma_i \ket{w_i} \bra{v_i}$. If we have a $(\alpha, m, \epsilon)$-block encoding $U$, our goal is to construct a new unitary $U_\Phi$ that implements:
$$P(A) = \sum_{i} P(\sigma_i) \ket{w_i} \bra{v_i}.$$

QSVT does this by using Projector-Controlled Phase gates interleaved with the block encoding $U$. Let $\Pi = \ket{0}\bra{0}^a \otimes \mathbb{1}$ be the projector onto the subspace where $A$ lives. The QSVT circuit schematics for a degree-$d$ polynomial is:
$$U_\Phi = e^{i\phi_1(2\Pi - I)} U e^{i\phi_2(2\Pi - I)} U^\dagger e^{i\phi_3(2\Pi - I)} U \dots.$$
To visualize this, the following circuit schematics can help.

![Alt text](BE_QSVT.png)

As promised above, you don't even need to care about these angles with our crispy clean implementation. Generalizing QSP is the final piece of this mosaic.

### Lifting constraints and generalizing QSP
While standard QSVT is a milestone, it is limited by the parity constraint. In simpler terms, that the polynomials must be strictly even or odd, and their coefficients be real (I'm sure that was a social media at some point, right?). 

Recent advancements (e.g., [Sünderhauf, 2023](https://arxiv.org/pdf/2312.00723)) have introduced Generalized versions that remove these restrictions.

- GQET (Generalized Quantum Eigenvalue Transformation): Specifically for Hermitian matrices, GQET applies complex polynomials $P(x)$ to eigenvalues. Unlike standard QET, $P(x)$ can have indefinite parity (e.g., $P(x) = x^2 + x + 1$), enabled by using general $SU(2)$ rotations in the signal processing stage.

- GQSVT (Generalized Quantum Singular Value Transformation): This extends QSVT to arbitrary matrices using the generalized framework, allowing for mixed parity polynomials.

Why does this generalization matter so much? It allows for mixed parity polynomial, resulting in you being to implement functions like $e^{-iAt}$ directly without splitting them into sine (odd) and cosine (even) components.

Apart from that, finding phase factors for standard QSVT is often a hard optimization problem scaling as $\tilde{O}(d^2)$. In the Generalized (GQSP) framework, phases can often be computed in linear time $\tilde{O}(d)$, making it significantly more practical for more quantum resource heavy application. This can also be see through the underlying circuit, which simplifies to the following schematics:

![Alt text](GQSP.png)

Ok, enough of this, let's now show how you can use, run, simulate, and provide resource analysis for three GQSP based applications.

## GQSP with Qrisp: [.poly](../../../reference/Block%20Encodings/methods/poly.rst), [.inv](../../../reference/Block%20Encodings/methods/inv.rst), and [.sim](../../../reference/Block%20Encodings/methods/sim.rst)

### Polynomial transformations: [.poly(coeffs)](../../../reference/Block%20Encodings/methods/poly.rst)

In Qrisp, the [BlockEncoding class](../../../reference/Block%20Encodings/BlockEncoding.rst) provides the [.poly(coeffs)](../../../reference/Block%20Encodings/methods/poly.rst) method, which leverages GQET to apply a transformation $p(A)$ to a Hermitian matrix. You simply provide the coefficients, and Qrisp’s internal "autopilot" handles the phase solving and circuit construction.

Example: Applying a custom polynomial. This example applies $p(A) = 1 + 2A + A^2$ to a matrix $A$ and applies the result to a vector $\ket{b}$.

In [None]:
import numpy as np
from qrisp import *
from qrisp.block_encodings import BlockEncoding

# Define a Hermitian matrix A and a vector b
A = np.array([[0.73255474, 0.14516978, -0.14510851, -0.0391581],
              [0.14516978, 0.68701415, -0.04929867, -0.00999921],
              [-0.14510851, -0.04929867, 0.76587818, -0.03420339],
              [-0.0391581, -0.00999921, -0.03420339, 0.58862043]])
b = np.array([0, 1, 1, 1])

# Generate a block-encoding and apply the polynomial [1, 2, 1]
BA = BlockEncoding.from_array(A)
BA_poly = BA.poly(np.array([1., 2., 1.]))

# Prepare the state |b>
def prep_b():
    operand = QuantumVariable(2)
    prepare(operand, b)
    return operand

@terminal_sampling
def main():
    # Use Repeat-Until-Success (RUS) to apply the non-unitary polynomial
    operand = BA_poly.apply_rus(prep_b)()
    return operand

res_dict = main()
amps = np.sqrt([res_dict.get(i, 0) for i in range(len(b))])

# Classical verification
c = (np.eye(4) + 2 * A + A @ A) @ b
c = c / np.linalg.norm(c)
print("QUANTUM SIMULATION\n", amps, "\nCLASSICAL SOLUTION\n", c)

QUANTUM SIMULATION                                                                   [2K[2K
 [0.02986319 0.57992489 0.62416744 0.52269524] 
CLASSICAL SOLUTION
 [-0.02986321  0.57992485  0.6241675   0.52269522]


### Solving linear systems: [.inv](../../../reference/Block%20Encodings/methods/inv.rst)

Matrix inversion is implemented via Quantum Eigenvalue Transformation (QET) using a polynomial approximation of $1/x$ over the domain $D_{\kappa} = [-1, -1/\kappa] \cup [1/\kappa, 1]$. The polynomial degree scales as $\mathcal{O}(\kappa \log(\kappa/\epsilon))$, where $\kappa$ is the condition number.

To paraphrase, the smaller the set precision $\epsilon$, the higher the degree of polynomial we need to approximate the inverse function, translating directly to requirind more quantum resources for the execution of the algorithm.

Let's, for example solve the Quantum Linear System Problem $Ax=b$

In [None]:
# Assuming A and b are defined as above
kappa = np.linalg.cond(A)
BA = BlockEncoding.from_array(A)

def prep_b():
    qv = QuantumFloat(2)
    prepare(qv, b)
    return qv

# Approximate A^-1 with target precision 0.01 and condition number bound 2
BA_inv = BA.inv(eps=0.01, kappa=kappa)

@terminal_sampling
def main():
    operand = BA_inv.apply_rus(prep_b)()
    return operand

res_dict = main()
amps = np.sqrt([res_dict.get(i, 0) for i in range(len(b))])

# Classical verification
c = (np.linalg.inv(A) @ b) / np.linalg.norm(np.linalg.inv(A) @ b)
print("QUANTUM SIMULATION\n", amps, "\nCLASSICAL SOLUTION\n", c)

QUANTUM SIMULATION                                                                   [2K
 [0.02711957 0.557081   0.5303464  0.638481  ] 
CLASSICAL SOLUTION
 [0.02944539 0.55423278 0.53013239 0.64102936]


Let's compare the resources of this approach to the one we mentioned above under the abbriviation CKS:

In [None]:
from qrisp.jasp import count_ops, depth

@count_ops(meas_behavior="0")
def main():

    BA = BlockEncoding.from_array(A)
    x = CKS(BA, 0.01, kappa).apply_rus(prep_b)()
    return x

@depth(meas_behavior="0")
def main2():

    BA = BlockEncoding.from_array(A)
    x = CKS(BA, 0.01, kappa).apply_rus(prep_b)()
    return x

print(BA_inv.resources(QuantumFloat(2)))
gate_counts_CKS = main()
depth_cks = main2()

print(f"{{'gate counts': {gate_counts_CKS}, 'depth': {depth_cks}}}")


{'gate counts': {'rx': 13, 'p': 149, 't_dg': 1548, 'x': 399, 'cy': 50, 'u3': 750, 'h': 846, 'gphase': 50, 'cx': 3471, 's': 48, 's_dg': 48, 'cz': 150, 't': 1173}, 'depth': 5659}
{'gate counts': {'p': 193, 't_dg': 1548, 'x': 377, 'cy': 50, 'z': 12, 'u3': 753, 'h': 890, 'gphase': 51, 'ry': 2, 'cx': 3516, 's': 70, 's_dg': 70, 'cz': 150, 't': 1173, 'measure': 17}, 'depth': 5134}


Uhm, that's surprising on two accounts?! Firstly, the resources for this seem enormous? And this is expected, because we're trying to invert a dense matrix, whereas the exponential quantum speedup comes for some specific, sparse, matrices - this means that they have only a small amount of entries, with all the rest being zero. Remember the Laplacian from the previous tutorial - it's mostly zeroes. The second surprise is that CKS does seen to be more efficient here, which is not what the complexity lower bound would suggest, right? Let's see the quality of the solutions in attempt to cheer up [.inv(eps, kappa)](../../../reference/Block%20Encodings/methods/inv.rst).

In [None]:
BA_inv = BA.inv(eps=0.01, kappa=kappa)

@terminal_sampling
def main():
    operand = BA_inv.apply_rus(prep_b)()
    return operand

res_dict = main()
amps = np.sqrt([res_dict.get(i, 0) for i in range(len(b))])

@terminal_sampling
def main_CKS():
    BA = BlockEncoding.from_array(A)
    
    BA_CKS = CKS(BA, eps=0.01, kappa=kappa)
    
    x = BA_CKS.apply_rus(prep_b)()
    return x

res_dict_CKS = main_CKS()

amps_CKS = np.sqrt([res_dict_CKS.get(i, 0) for i in range(len(b))])

# Classically obtained result for verification
c = (np.linalg.inv(A) @ b) / np.linalg.norm(np.linalg.inv(A) @ b)

print("QUANTUM SIMULATION via ``.inv(epsilon, kappa)``\n", amps, "\nQUANTUM SIMULATION via CKS\n", amps_CKS,"\nCLASSICAL SOLUTION\n", c)

QUANTUM SIMULATION via ``.inv(epsilon, kappa)``                                      [2K
 [0.02711956 0.55708088 0.53034634 0.63848115] 
QUANTUM SIMULATION via CKS
 [0.02711473 0.55709867 0.5303514  0.63846163] 
CLASSICAL SOLUTION
 [0.02944539 0.55423278 0.53013239 0.64102936]


Cheering up didn't help. As a last resort, let's try another example of the Laplacian operator with the custom block encoding (providing the answers to the homework given above. You're... welcome?).

In [None]:
from qrisp import gphase
from qrisp.jasp import count_ops, depth

def I(qv):
    # Identity: do nothing
    pass

def V(qv):
    # Forward cyclic shift with a global phase -1
    qv += 1
    gphase(np.pi, qv[0])  # multiply by -1

def V_dg(qv):
    # Backward cyclic shift with a global phase -1
    qv -= 1
    gphase(np.pi, qv[0])

unitaries = [I, V, V_dg]
coeffs = np.array([2.0, 1.0, 1.0])
BE = BlockEncoding.from_lcu(coeffs, unitaries)

BE_inv = BE.inv(eps=0.01, kappa = np.linalg.cond(A))
print(BE_inv.resources(QuantumFloat(2)))

@count_ops(meas_behavior="0")
def main():

    x = CKS(BE, 0.01, kappa).apply_rus(prep_b)()
    return x

@depth(meas_behavior="0")
def main2():

    x = CKS(BE, 0.01, kappa).apply_rus(prep_b)()
    return x

gate_counts_CKS = main()
depth_cks = main2()

print(f"{{'gate counts': {gate_counts_CKS}, 'depth': {depth_cks}}}")


{'gate counts': {'rx': 13, 'p': 74, 't_dg': 348, 'x': 149, 'u3': 150, 'h': 346, 'gphase': 50, 'cx': 996, 's': 98, 's_dg': 48, 't': 298, 'measure': 50}, 'depth': 1834}
{'gate counts': {'p': 118, 't_dg': 348, 'x': 127, 'z': 12, 'u3': 153, 'h': 390, 'gphase': 51, 'ry': 2, 'cx': 1041, 's': 120, 's_dg': 70, 't': 298, 'measure': 65}, 'depth': 1846}


And this, ladies and gentlemen interested in block encodings, is how state-of-the-art quantum resource estimation is done. Simple, huh?

Oh, and a comment on the results: CKS ain't too shabby afterall! This is why, fellow readers, you should always do your due diligence and perform analysis of the resources. "Big O" notation can be misleading because it ommits the scaling factors. Well, in this case the results were actually expected because the circuits are extremely similar. But still, estimate your quantum resources!

### Hamiltonian simulation: [.sim](../../../reference/Block%20Encodings/methods/sim.rst)

For a block-encoded Hamiltonian $H$, the [.sim(t, N)](../../../reference/Block%20Encodings/methods/sim.rst) method approximates the unitary evolution $e^{-itH}$. This utilizes the Jacobi-Anger expansion into Bessel functions:$$e^{-it\cos(\theta)} \approx \sum_{n=-N}^{N}(-i)^nJ_n(t)e^{in\theta}$$The truncation error decreases super-exponentially with the order $N$.

We can now simply simulate an Ising Hamiltonian:

In [None]:
from qrisp import *
from qrisp.block_encodings import BlockEncoding
from qrisp.operators import X, Z

def create_ising_hamiltonian(L, J, B):
    return sum(-J * Z(i) * Z(i + 1) for i in range(L-1)) + sum(B * X(i) for i in range(L))

L = 4
H = create_ising_hamiltonian(L, 0.25, 0.5)
BE = BlockEncoding.from_operator(H)

@terminal_sampling
def main(t):
    # Evolve state |0> for time t using order N=8
    BE_sim = BE.sim(t=t, N=8)
    operand = BE_sim.apply_rus(lambda: QuantumFloat(L))()
    return operand

res_dict = main(0.5)
print(res_dict)

{0.0: 0.7794808490061743, 1.0: 0.050481999502054155, 8.0: 0.05048195852386222, 2.0: 0.04959365563071014, 4.0: 0.04959364445483961, 12.0: 0.0032753847863541677, 3.0: 0.0032753831565397154, 9.0: 0.003269415241677061, 6.0: 0.0032556919711589703, 5.0: 0.00321428350820277, 10.0: 0.0032142823440495898, 14.0: 0.00021251063330640093, 7.0: 0.00021251056054682717, 13.0: 0.00021228243017926458, 11.0: 0.00021228228466011706, 15.0: 1.3865965684651424e-05}


## Conclusion

As hopefully you're now convinced, having block encodings as programming abstractions lowers the barrier to entry for classical developers to dive into the field of state-of-the-art quantum computing algorithms, while at the same time allowing researchers to focus on the algorithm (the function to be applied) rather than the software implementations!

A quick collection of take-home messages after this throrough (and math heavy) tutorial. We have transitioned from basic matrix representations to complex functional analysis on a quantum computer:

- Qubitization is the engine that enables walking through 2D subspaces.

- QSP/GQSP is the steering wheel, allowing you to transform $A$ into almost any $f(A)$.

- Qrisp is the autopilot, solving for phase angles and automating resource management and uncomputation.

- Always do your due diligence and estimate your quantum resources of your state-of-the-art top notch block encoding and other algorithmic approach.

If you’re ready to get your hands dirty, the next tutorial puts these tools to work on a 10-qubit Heisenberg model. We’ll show you how to use Gaussian filtering to get a better ground state and use Chebyshev you're now a pro at. You will also use a Repeat-Until-Success (RUS) trick in Qrisp to lock in your desired state. Finally, we’ll double-check the work together by watching the energy levels drop, proving that you’ve successfully found the ground state. It’s the perfect way to see how all this theory actually solves a real-world physics problem.
