<a href="https://colab.research.google.com/github/deltorobarba/sciences/blob/master/maths.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# <font color="blue">**Mathematics 🦋**

![sciences](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_0000.png)

#### <font color="blue">**Quantum Algorithms**

###### *Boolean Functions*

https://de.m.wikipedia.org/wiki/Boolesche_Funktion

https://de.m.wikipedia.org/wiki/Boolesche_Algebra

https://de.m.wikipedia.org/wiki/Disjunktive_Normalform

https://de.m.wikipedia.org/wiki/Konjunktive_Normalform

https://en.m.wikipedia.org/wiki/Analysis_of_Boolean_functions

Fourier analysis of functions on the Boolean cube

###### *Quantum Turing machine*

https://en.m.wikipedia.org/wiki/Quantum_Turing_machine

Quantum Turing machines can be related to classical and probabilistic Turing machines in a framework based on transition matrices. https://en.m.wikipedia.org/wiki/Stochastic_matrix

A quantum Turing machine with postselection was defined by Scott Aaronson, who showed that the class of polynomial time on such a machine (PostBQP) is equal to the classical complexity class PP.

A way of understanding the quantum Turing machine (QTM) is that it generalizes the classical Turing machine (TM) in the same way that the quantum finite automaton (QFA) generalizes the deterministic finite automaton (DFA). In essence, the internal states of a classical TM are replaced by pure or mixed states in a Hilbert space; the transition function is replaced by a collection of unitary matrices that map the Hilbert space to itself.

###### *Pure State, Mixed State and POVM and Square root measurement (pretty good measurement)*

An $n$-dimensional **pure state** is $|\psi\rangle=\sum_{i=1}^n \alpha_i|i\rangle$, where $|i\rangle$ is the $n$-dimensional unit vector that has a 1 only at position $i$, the $\alpha_i$ 's are complex numbers called the amplitudes, and $\sum_{i \in \mid n]}\left|\alpha_i\right|^2=1$.

An $n$-dimensional **mixed state** (or density matrix) $\rho=\sum_{i=1}^n p_i\left|\psi_i\right\rangle \psi_i \mid$ is a mixture of pure states $\left|\psi_1\right\rangle, \ldots,\left|\psi_n\right\rangle$ prepared with probabilities $p_1, \ldots, p_n$, respectively. The eigenvalues $\lambda_1, \ldots, \lambda_n$ of $\rho$ are non-negative reals and satisfy $\sum_{i \in[n]} \lambda_i=1$.

If $\rho$ is pure (i.e., $\rho=|\psi\rangle \psi \mid$ for some $|\psi\rangle)$, then one of the eigenvalues is 1 and the others are 0 .

To obtain classical information from $\rho$, one could apply a **POVM (positive-operator-valued measure)** to the state $\rho$. An $m$-outcome POVM is specified by a set of positive semidefinite matrices $\left\{M_i\right\}_{i \in[m]}$ with the property $\sum_i M_i=I d$. When this POVM is applied to the mixed state $\rho$, the probability of the $j$-th outcome is given by $\operatorname{Tr}\left(M_j \rho\right)$.

**square root measurement (pretty good measurement)**

The square root measurement (SRM), also known as the pretty good measurement (PGM), is a measurement introduced by Paul Hausladen and William Wootters in 1994. It is a quantum measurement that can be **used to distinguish between a set of non-orthogonal quantum states**.

* The SRM is constructed by first finding the square roots of the inner products between the different states. These square roots are then normalized to form the measurement operators.

* The SRM has several desirable properties. It is simple to construct, and it is asymptotically optimal for distinguishing between a large number of non-orthogonal states. It is also "pretty good" in the sense that it can distinguish between almost orthogonal states with a low probability of error.

* The SRM has been used in a variety of applications in quantum computing, including quantum cryptography, quantum communication, and quantum machine learning.

Here are some of the key properties of the square root measurement:

* It is a projective measurement, which means that it always gives a definite result.
* It is unbiased, which means that the probability of each outcome is equal.
* It minimizes the probability of error for distinguishing between a set of non-orthogonal states.
* It is asymptotically optimal, which means that its performance approaches the optimal as the number of states increases.

The square root measurement is a powerful tool that can be used to perform a variety of tasks in quantum computing. It is simple to construct and efficient to implement, and it has been shown to be effective in a variety of applications.

**What is an example for a set of non-orthogonal quantum states?**

Sure, here is an example of a set of non-orthogonal quantum states:

```
|ψ⟩ = |0⟩ + |1⟩
|ϕ⟩ = |0⟩ - |1⟩
```

These two states are non-orthogonal because their inner product is not zero:

```
⟨ψ|ϕ⟩ = (|0⟩ + |1⟩) ⟨0| - |1⟩) = 0 + 0 = 0
```

In other words, it is not possible to distinguish between these two states with certainty.

Another example of a set of non-orthogonal quantum states is the set of all qubit states that lie on the Bloch sphere. The Bloch sphere is a unit sphere in three dimensions, and each point on the sphere represents a possible state of a qubit. Any two points on the Bloch sphere that are not diametrically opposite are non-orthogonal.

Non-orthogonal quantum states are a fundamental part of quantum mechanics. They play a role in many quantum phenomena, such as quantum entanglement and quantum computing.

**When they occur?**



Non-orthogonal quantum states occur whenever two or more quantum states are not perfectly aligned. This can happen in a variety of ways, such as:

* When a quantum state is prepared in a superposition of two or more states.
* When a quantum state is subjected to a measurement that does not perfectly distinguish between the different states.
* When a quantum state is passed through a noisy channel.

Non-orthogonal quantum states can be difficult to distinguish, but there are a number of techniques that can be used to do so.

* One common technique is to use a projective measurement, which is a measurement that projects the quantum state onto one of a set of mutually orthogonal states. Another technique is to use a quantum algorithm, such as the Shor algorithm, which can be used to distinguish between non-orthogonal states with a high degree of accuracy.

* Non-orthogonal quantum states play an important role in many quantum phenomena, such as quantum entanglement and quantum computing. In quantum entanglement, two or more quantum systems are linked together in such a way that they cannot be described independently. This is possible because the quantum states of the entangled systems are non-orthogonal.

* In quantum computing, non-orthogonal quantum states are used to store information and to perform calculations. For example, a quantum computer can use non-orthogonal quantum states to represent the solutions to a problem. By performing measurements on these states, the quantum computer can then find the solution to the problem.

The study of non-orthogonal quantum states is a rapidly developing field. As our understanding of quantum mechanics grows, we are finding new and exciting ways to use non-orthogonal quantum states to solve problems and to create new technologies.

**Yes, there is a difference between the square root measurement and the positive-operator-valued measure (POVM).**

A square root measurement is a special type of POVM. It is a projective measurement, which means that it always gives a definite result. The square root measurement is constructed by first finding the square roots of the inner products between the different states. These square roots are then normalized to form the measurement operators.

A POVM is a more general type of measurement. It is not required to be projective, and it can give a range of possible results. The POVM is defined by a set of operators, called effect operators, that satisfy certain conditions.

The main difference between the square root measurement and the POVM is that the square root measurement is always projective, while the POVM is not. This means that the square root measurement always gives a definite result, while the POVM can give a range of possible results.

The square root measurement is a more powerful tool than the POVM, but it is also more restrictive. The square root measurement can only be used to distinguish between non-orthogonal states, while the POVM can be used to distinguish between any set of states.

In general, the POVM is a more versatile tool than the square root measurement. However, the square root measurement is often a better choice when it is important to obtain a definite result.

Here is a table summarizing the key differences between the square root measurement and the POVM:

| Feature | Square root measurement | POVM |
|---|---|---|
| Projective | Yes | No |
| Definite result | Always | Not always |
| Generality | Less | More |
| Versatility | Less | More |
| Usefulness | When it is important to obtain a definite result | When it is not important to obtain a definite result or when the states to be distinguished are not non-orthogonal |

I hope this helps!

###### *Analysis of state identification problems done by Fourier analysis on the Boolean cube*

The Fourier analysis of Boolean functions is a powerful tool for analyzing state identification problems in quantum computing. In quantum computing, a state identification problem is the problem of determining the state of a quantum system given a set of measurements.

* The Boolean cube is a mathematical object that represents all possible states of a quantum system.

* The Fourier transform of a Boolean function is a way of representing the function as a sum of basis functions, each of which corresponds to a particular state of the Boolean cube.

* The Fourier analysis of Boolean functions can be used to solve state identification problems by first transforming the function to its Fourier representation. This representation can then be used to efficiently compute the probability of obtaining a particular measurement outcome for any given state of the system.

* For example, consider the problem of identifying the state of a qubit. The Boolean cube for a qubit has two states, 0 and 1. The Fourier transform of the function that maps the state of the qubit to the measurement outcome is a sum of two basis functions, one for each state of the qubit. The probability of obtaining a measurement outcome of 0 is given by the coefficient of the basis function corresponding to the state 0.

* The Fourier analysis of Boolean functions can be used to solve state identification problems for systems of any size. However, the complexity of the problem grows exponentially with the size of the system.

Here are some of the applications of Fourier analysis of Boolean functions in quantum computing:

* **State identification:** As mentioned above, Fourier analysis can be used to solve state identification problems. This is a fundamental problem in quantum computing, as it is necessary for many other tasks, such as quantum algorithms and quantum communication.
* **Quantum error correction:** Fourier analysis can be used to design quantum error correction codes. These codes are used to protect quantum states from errors caused by noise.
* **Quantum machine learning:** Fourier analysis can be used to develop quantum machine learning algorithms. These algorithms can be used to learn from data in a quantum-mechanical way.
* **Quantum cryptography:** Fourier analysis can be used to develop quantum cryptography protocols. These protocols can be used to securely transmit information over a quantum channel.


###### *classically simulate quantumsystems*

How do you best classically #simulate #quantumsystems?
The answer is, as always, it depends. Each classical method has its own set of strengths and weaknesses, types of systems or observables where they excel or underperform.

Today, we have added to the pool of viable simulation methods with LOWESA, and we benchmarked it on everyone's favorite example: 127-qubit time evolution dynamics in a 2D-connected spin-lattice presented by IBM in their recent paper (Nature 618, p. 500-505 (2023). First, we approximately reconstruct the entire expectation landscape spanned by the parameters of the quantum circuit, i.e., the local field and coupling coefficients of the system Hamiltonian. This can be seen as building a surrogate function for the true quantum expectation landscape. After this initial overhead, we can evaluate our surrogate for new circuit parameters in fractions of a second to a few seconds on a Laptop for the most complicated 127-qubit cases. Changing the Hamiltonian coefficients or the simulation time can all be done in post-processing and is very fast.

In summary, we achieve similar accuracy to leading classical simulation methods, and the more expectations you want to calculate, the larger becomes the wall-time gap to many of these methods. We believe, our algorithm may become the go-to for people who want to quickly scan large portions of the expectation landscape.

Here are some links to our work:
arXiv: https://lnkd.in/ei9NT_uN
scirate: https://lnkd.in/eS9wEZXP
my twitter/X thread (shameless plug): https://lnkd.in/eW9KxzV6

This was one of my projects during the Los Alamos National Laboratory #quantumcomputing summer school. It was very fun and rewarding! Thank you for giving me this opportunity. Thank you to Lukasz Cincio and Prof. Zoë Holmes for their guidance, and to Enrico Fontana for the vital discussions about the LOWESA algorithm.

Below I'm leaving you with a nice GIF showing the time evolution of single-site magnetization.

###### *Miscellaneous*

[Jens Eisert: What do we do with Quantum Computers?](https://youtu.be/70MD0pe6HGs) and [(2023) Disentangling Hype from Practicality](https://www.researchgate.net/publication/372075046_Disentangling_Hype_from_Practicality_On_Realistically_Achieving_Quantum_Advantage)

**Where does a quantum computer has an (exponential) advantage over a classical computer in machine learning tasks?**

* Classical Data (learn classical Boolean circuits):
  * average error: no
  * worse error: no

* Quantum Data (learn unknown cptp maps over known observables):
  * average error: no - classical ML can learn unknown state p or observable O very well
  * worse error: yes (predicting ex-
pectation values of Pauli operators in an unknown n-
qubit quantum state ρ)

*Pending questions*

what is the chernoff bound

how can you learn the phase of a heisenberg-weyl operator?

what is the phase of a heisenberg-weyl operator?

what is a simplectic inner product?


**what is meant by the parity of the two qubits?**

The "parity" of two qubits refers to whether the number of qubits in the state |1⟩ is even or odd.

The parity operator for two qubits, which could be denoted as Z ⊗ Z in terms of Pauli matrices, acts on a two-qubit state |ψ⟩ = α|00⟩ + β|01⟩ + γ|10⟩ + δ|11⟩ as follows:

(Z ⊗ Z)|ψ⟩ = α|00⟩ - β|01⟩ - γ|10⟩ + δ|11⟩

In this case, the eigenvalues of the parity operator are +1 and -1. States |00⟩ and |11⟩ have even parity (0 and 2 qubits are in the state |1⟩, respectively, both of which are even numbers), so the parity operator gives them a factor of +1. States |01⟩ and |10⟩ have odd parity (1 qubit is in the state |1⟩), so the parity operator gives them a factor of -1.

Measuring the expectation value of the parity operator tells you about the relative likelihood of being in an even parity state versus an odd parity state. For example, if a two-qubit system is in the state |ψ⟩ = 1/√2(|00⟩ + |11⟩), then the expectation value of the parity operator would be +1, indicating that measurements of both qubits will always yield the same result, either both 0 or both 1. On the other hand, if the system is in the state |ψ⟩ = 1/√2(|01⟩ + |10⟩), then the expectation value of the parity operator would be -1, indicating that measurements of both qubits will always yield opposite results.

John Preskyll - Quantum Computation

https://youtu.be/w08pSFsAZvE

**what means "bounded fan-in (a general form of locality restriction)" in the context of quantum computing?**

In the context of quantum computing, the concept of "fan-in" relates to the number of input bits that a quantum gate or operation can accept.

When we say a system or a circuit has "bounded fan-in," it means that there's a limit to the number of input bits any single gate or operation can accept.

"Bounded fan-in" is a kind of "locality restriction." In a locality restriction, we limit how our quantum operations can act. In physical systems, this is often motivated by the fact that most physical interactions are local -- for instance, each qubit in a quantum computer might only interact directly with a small number of nearby qubits.

A gate or operation with bounded fan-in can only act on a fixed number of qubits. For example, if we're working with a two-qubit bounded fan-in, our gates can only act on one or two qubits at a time -- we can't have a single gate that acts on three or more qubits simultaneously. This is a general form of locality restriction because it restricts how 'far' each operation can 'reach' in the space of qubits.

This is important in quantum computing because it affects the kinds of circuits and algorithms we can use. It's also important when thinking about how to physically implement a quantum computer, as real-world quantum systems often have natural locality restrictions due to physical constraints.

**what means strings over n bits?**

In the context of computer science and mathematics, a "string over n bits" usually refers to a sequence or set of binary digits (bits) of length n. Each bit can either be a 0 or a 1.

For instance, if n = 3, then the possible strings over 3 bits are: 000, 001, 010, 011, 100, 101, 110, 111. These are all the possible combinations of 3 bits.

The total number of possible strings over n bits is 2^n, because each bit has 2 possible states (0 or 1), and there are n such bits. This principle is widely used in areas such as digital computing, information theory, cryptography, etc.

**what means "covering number bounds" in quantum computing?**

Covering number bounds in quantum computing (and more generally in the field of mathematics) refer to a metric related to how many of a certain-sized object are needed to "cover" or contain another object.

The concept of a covering number is used in various mathematical contexts, but the general idea remains the same. The covering number of a set (in a metric space) is the smallest number of "balls" of a given radius that can cover the entire set.

In quantum computing, this might be used in the context of quantum states or operations. For instance, if you're talking about quantum states in a Hilbert space, you might be asking: how many quantum states (treated as "balls" in the Hilbert space) of a certain "size" (or distance from each other) do you need to cover the entire space of possible quantum states?

Covering numbers are important in quantum computing because they can help us understand the complexity of various tasks, including the complexity of approximating a given quantum state or operation.

Bounding the covering number, or providing covering number bounds, is important because it gives us a measure of the "size" or complexity of the space we're dealing with. These bounds can help us understand the resources required for tasks such as quantum state preparation, quantum channel simulation, and so on.

###### *Packing Nets*

"we cover the entire set of CPTP maps F with a maxi-
mal packing net"

page 3 in: Information-theoretic bounds on quantum advantage in machine learning

A packing net in quantum computing is a quantum circuit that can be used to pack a set of qubits into a smaller number of physical qubits. This is done by entangling the qubits in such a way that they can represent a larger number of states than they could individually.

Packing nets are used in a variety of quantum computing applications, such as quantum simulation and quantum machine learning. They can also be used to reduce the size and complexity of quantum circuits, which can make them more efficient to implement.

The idea of packing nets was first introduced in the paper "Packing Qubits for Efficient Quantum Computation" by Farhi et al. (2014). The paper showed that it is possible to pack a set of n qubits into a smaller number of physical qubits, O(log n), using a packing net.

Since then, there has been a lot of research on packing nets. Several different packing nets have been proposed, and their performance has been evaluated. Packing nets have also been used in a variety of quantum computing applications.

One of the challenges in using packing nets is that they can be difficult to implement. The entanglement between the qubits in a packing net can be fragile, and it can be difficult to maintain this entanglement during the execution of a quantum circuit.

Despite these challenges, packing nets are a promising tool for quantum computing. They can be used to reduce the size and complexity of quantum circuits, which can make them more efficient to implement. Packing nets are also being used in a variety of quantum computing applications, such as quantum simulation and quantum machine learning.

Here are some of the benefits of using packing nets in quantum computing:

* They can reduce the size and complexity of quantum circuits, which can make them more efficient to implement.
* They can be used to pack a set of qubits into a smaller number of physical qubits, which can save space and resources.
* They can be used to improve the performance of quantum algorithms.
* They are a promising tool for developing new quantum computing applications.

However, there are also some challenges associated with using packing nets:

* They can be difficult to implement.
* The entanglement between the qubits in a packing net can be fragile.
* They are not yet well-understood, and there is still a lot of research to be done on their performance and applications.

Overall, packing nets are a promising tool for quantum computing. They have the potential to reduce the size and complexity of quantum circuits, improve the performance of quantum algorithms, and enable new quantum computing applications.

###### *Eigenvalues, Measurements and Expectation Values*

https://en.m.wikipedia.org/wiki/Weyl%27s_inequality

**what is meant by trace distance of the state prior to any measurements?**

In the context of quantum mechanics and quantum information theory, the trace distance is a measure of the distinguishability of two quantum states. In a practical sense, it gives you the maximum probability over all possible measurements that you can correctly tell the two states apart.

The trace distance between two quantum states ρ and σ is defined as:

D(ρ, σ) = 1/2 * Tr[|ρ - σ|]

where Tr denotes the trace, and |A| refers to the absolute value of the matrix A (which is defined as the square root of the product of A with its adjoint, or A*A†).

The trace distance has a value between 0 and 1, where 0 indicates that the two states are identical, and 1 means the states are completely distinguishable.

When the trace distance is mentioned in relation to "the state prior to any measurements", it generally refers to assessing how different two quantum states are before any action is taken to measure or otherwise alter these states. This can be important when considering the evolution of quantum systems, or assessing the impact of quantum noise or errors on a quantum state.

In [None]:
import numpy as np

def trace_distance(state1, state2):
  """Calculates the trace distance between two quantum states.

  Args:
    state1: A numpy array representing the first quantum state.
    state2: A numpy array representing the second quantum state.

  Returns:
    The trace distance between the two quantum states.
  """

  # Calculate the difference between the two quantum states.
  difference = state1 - state2

  # Calculate the trace of the absolute value of the difference.
  trace = np.trace(np.abs(difference))

  # Return the square root of the trace.
  return 0.5 * np.sqrt(trace)

if __name__ == "__main__":
  # Create two quantum states.
  state1 = np.array([[1, 0], [0, 0]])
  state2 = np.array([[0, 0], [0, 1]])

  # Calculate the trace distance between the two quantum states.
  trace_distance = trace_distance(state1, state2)

  # Print the trace distance.
  print("The trace distance is:", trace_distance)


The trace distance is: 0.7071067811865476


In [None]:
trace_distance = 0.5 * np.sqrt(np.trace(np.abs(state1 - state2)))

Measuring a quantum state in the computational basis is a Clifford measurement. A Clifford measurement is a measurement that can be performed by a circuit consisting only of Clifford gates. The computational basis measurements are the Pauli Z and Pauli X measurements, which are both Clifford gates. Therefore, measuring a quantum state in the computational basis is a Clifford measurement. The computational basis measurements are equivalent to the Pauli Z and Pauli X measurements

**what is a random Clifford measurement?**

In the field of quantum computing, a Clifford measurement involves measuring a quantum state in a basis given by a Clifford operation.

The Clifford group is a specific set of operations in quantum mechanics that map Pauli operators to other Pauli operators under conjugation. It includes operations like the Hadamard gate, phase gate, CNOT gate, and others. The Clifford group has important applications in quantum error correction and quantum cryptography.

A random Clifford measurement is a measurement process where the measurement basis is chosen randomly from the set of all possible Clifford operations. Random Clifford operations can be used to create a form of "scrambling" that can be used to probe the properties of quantum systems, and they also find use in protocols for testing quantum computers, like the so-called randomized benchmarking procedure.

When applied to quantum state tomography, random Clifford measurements can be used to extract information about the quantum state in a way that can be more efficient or robust against certain types of errors. For example, they might be used in a protocol for "shadow tomography", where a small number of random measurements are used to estimate properties of the quantum state. The specifics of how these measurements are used and their performance characteristics can depend on the details of the quantum system and the tomography protocol.

**Expectation Value and Backpropagation**

The average value (expectation value) of the measurement result is given by the
Born rule: **bold text**

> $\langle B\rangle=\left\langle\psi\left|U^{\dagger}(\theta) B U(\theta)\right| \psi\right\rangle$

Just linear algebra! Every step is a matrix-vector or matrix-matrix multiplication

Expectation values depend continuously on the gate parameters

**Backpropagating Through Quantum Circuits**

However, as long as we don't "zoom in" to what is happening in the quantum circuit, backpropagation can treat the quantum circuit as a single indivisible function

The expectation value of a quantum circuit is a differentiable function

> $
f(\theta)=\left\langle\psi\left|U^{\dagger}(\theta) B U(\theta)\right| \psi\right\rangle=\langle B\rangle$

Running on hardware and using the parameter-shift rule, we can provide both ingredients needed by backpropagation

> $
\left(\langle B\rangle, \frac{\partial}{\partial \theta}\langle B\rangle\right)
$

**Spectral Gap**

Minimum eigenvalue gap of my hamiltonian (spectral gap decrease polynomial or exponentially as a function of the number of particles?). Eigenvalue gap can become exponentially small.

The spectral gap, in the context of quantum mechanics or graph theory, is the difference between the lowest two eigenvalues of a system’s Hamiltonian or the adjacency matrix of a graph. It provides crucial information about the system’s dynamics or the graph’s connectivity.

In the context of Hamiltonian mechanics, the eigenvalues represent the possible energy levels of the system. The "minimum eigenvalue gap" of a Hamiltonian refers to the smallest difference between any two of these energy levels.

The importance of this spectral gap is deeply connected to the behavior of quantum systems. If there is a non-zero gap between the lowest energy state (the ground state) and the rest of the spectrum, the system is said to be gapped. If the gap is zero, the system is gapless.

In a gapped system, at low temperatures, thermal fluctuations aren't typically strong enough to push the system out of its ground state. This makes the system's behavior at low temperatures easier to predict.

On the other hand, in a gapless system, even small fluctuations can cause the system to change its state, resulting in complex behavior.

In the field of quantum computation, the spectral gap plays a crucial role in the speed and feasibility of quantum algorithms. For example, in adiabatic quantum computing, the algorithm's running time is inversely proportional to the square of the spectral gap. A smaller spectral gap implies a slower-running algorithm, which is not desirable.

**Getting back to 𝑓(𝜌,𝑂) = tr(𝑂E(𝜌)) : why do I need to calculate the eigenvalues (trace)?**

*From paper: Information-theoretic bounds on quantum advantage in machine learning*

* Trace = sum of Eigenvalues (in this context the trace gives a weighted sum of all outcomes – in other words, the average or expected value.)
* Eigenvalues = each eigenvalue of an observable represents a possible outcome of a measurement

In the expression 𝑓(𝜌,𝑂) = tr(𝑂E(𝜌)), the function tr denotes the trace of a matrix, not its eigenvalues. The trace of a matrix is defined as the sum of the elements on its main diagonal (from the top left to the bottom right). This is true for any square matrix, not just for those representing quantum states or operations.

Now, **for the specific case of this expression, the trace is used to calculate the expected value of an observable for a quantum state that has undergone some operation E.** Here's why:

In quantum mechanics, the expected (or average) value of an observable O for a state ρ is given by the formula tr(𝑂𝜌). This comes from the Born rule, one of the key postulates of quantum mechanics, which relates the probabilities of measurement outcomes to the state of the system. The observable O is represented by a Hermitian operator, and the state ρ is represented by a density operator.

In your expression, E(𝜌) represents the state after some operation E has been applied. So tr(𝑂E(𝜌)) is the expected value of the observable O for the state resulting from E(𝜌).

The trace is crucial here because it ensures that we're summing up the probabilities of all possible outcomes. In a physical sense, it is analogous to summing over all possible states that the system might be in. **Because each eigenvalue of an observable represents a possible outcome of a measurement, and the corresponding element of the density matrix gives the probability of that outcome**, taking the trace in this context gives a weighted sum of all outcomes – in other words, the average or expected value.

**How can you accurately predict expectation values of Pauli observables in an unknown n-qubit quantum state ρ?**

In quantum mechanics, the expectation value of an observable represented by an operator A on a state ρ can be calculated using the formula tr(Aρ), where "tr" denotes the trace operation.

The Pauli matrices (or Pauli operators), named after Wolfgang Pauli, are a set of 2 × 2 matrices which are fundamental in quantum mechanics. They are defined as:

- Pauli-X: σ_x = [[0, 1], [1, 0]]
- Pauli-Y: σ_y = [[0, -i], [i, 0]]
- Pauli-Z: σ_z = [[1, 0], [0, -1]]

These operators can be extended to n-qubit systems to form a complete basis for Hermitian operators on that system. A Pauli observable on an n-qubit system is a tensor product of Pauli matrices, one for each qubit. For example, Z ⊗ Z is a Pauli observable for a 2-qubit system that measures the parity of the two qubits.

***In order to predict the expectation value of a Pauli observable in an unknown state ρ, you would generally need to perform quantum state tomography***, which involves making many measurements on many identically-prepared quantum systems, and using the results to reconstruct the state ρ. Once you have ρ, you can use the formula tr(Aρ) to calculate the expectation value of any observable A.

However, ***full quantum state tomography is not always practical, especially for large systems, due to the number of measurements required***. There are other techniques like direct estimation of the expectation value via sampling, adaptive methods, or compressed sensing techniques which can estimate the expectation value with fewer measurements.

In a practical quantum computing scenario, the expectation value of a Pauli observable would often be estimated by preparing many copies of the state, measuring each in the basis corresponding to the observable, and taking the average of the results. These measurements can be implemented with the appropriate rotations and then a computational basis measurement.

Remember, this is all assuming you have physical access to the quantum state or quantum system. If you only have classical access to information about the system, you can't accurately predict the expectation values without additional assumptions or information. Quantum mechanics fundamentally limits what you can know about a quantum system without direct quantum access to the system.

**what is the difference between mixed state and product state?**

A mixed state and a product state are two different types of quantum states that arise in quantum mechanics and quantum information theory, each with its own distinct properties and interpretations.

1. **Mixed State**: A mixed state describes a statistical mixture of different quantum states, each with a certain probability. Mathematically, a mixed state is represented by a density matrix, which is a positive-semidefinite Hermitian operator with trace 1. If a system is in a mixed state, it means we have some classical uncertainty about which specific quantum state it's in. For example, a system might be in state |0⟩ with probability 1/2 and in state |1⟩ with probability 1/2. This would correspond to the mixed state represented by the density matrix ρ = 1/2|0⟩⟨0| + 1/2|1⟩⟨1|.

2. **Product State**: A product state refers to a state of a composite system that can be written as the tensor product of states of its subsystems. For example, if you have a two-qubit system and the first qubit is in state |0⟩ and the second qubit is in state |1⟩, the overall state of the system is the product state |0⟩⊗|1⟩, often written as |01⟩. A product state implies that there are no correlations between the subsystems, i.e., the subsystems are not entangled. If a state of a composite system cannot be written as a tensor product of states of its subsystems, it's called an entangled state.

So, in summary, the difference lies in what each state represents: a mixed state represents classical uncertainty about which pure state a system is in, while a product state represents a separable (i.e., non-entangled) state of a composite system. Notably, these are not mutually exclusive: you can have a mixed state of product states, for example.

**what is a POVM measurement? (vs projective measurements)**

A Positive Operator-Valued Measure (POVM) is a mathematical formalism used in quantum mechanics to describe the process of making a measurement on a quantum system. **Unlike projective measurements (also called von Neumann measurements), which are limited to a specific set of states (the eigenstates of an observable), POVMs can represent more general kinds of measurements.**

A POVM is a collection of positive semi-definite operators {E_i} acting on the state space of a quantum system, which sum up to the identity operator, that is, ∑i E_i = I. These operators E_i are also called "effects".

When you perform a measurement described by a POVM on a quantum system in a state represented by a density matrix ρ, the probability of obtaining outcome i (associated with operator E_i) is given by tr(E_i ρ), where tr() denotes the trace operation.

The ***key aspect of POVMs is that they allow for the description of measurements that do not have a complete set of orthogonal eigenstates, which are typical in situations like indirect or incomplete measurements***, measurements involving ancillary systems, and measurements involving quantum entanglement. This makes POVMs an essential tool in quantum information theory and quantum computing.

* <font color="red">mathematical examples?

* <font color="red">code examples?

**what is a POVM measurement on the state E(|xi⟩⟨xi|)?**

*Von Neumann Measurement vs POVM*

A Positive Operator-Valued Measure (POVM) is a type of generalized measurement used in quantum mechanics. When we want to make a measurement in quantum mechanics, we typically talk about observables, which are represented by Hermitian operators, and the associated eigenstates and eigenvalues. The eigenvalues are the possible outcomes of the measurement, and the state collapses into the corresponding eigenstate upon measurement.

However, there are situations where a standard von Neumann measurement (which directly associates observables with Hermitian operators) is not sufficient. For example, if we want to consider more general types of measurements or if we're working with a subset of a larger system, we might need to use a POVM.

A POVM is a set of positive semi-definite operators {E_i} that sum to the identity operator. Each E_i corresponds to a potential outcome of the measurement. The probability of each outcome i when measuring a state ρ is given by tr(E_i ρ).

So if you have a state represented by E(|xi⟩⟨xi|) and a POVM {E_i}, the probability of getting the outcome corresponding to E_i would be tr(E_i E(|xi⟩⟨xi|)).

The states E_i themselves don't necessarily need to be orthogonal or even pure states, which makes POVMs a very powerful and general measurement framework in quantum mechanics. They are often used in quantum information theory and quantum computation where complex measurement schemes are more common.


**what is a two-copy Bell basis measurement?**

A two-copy Bell basis measurement is a measurement in the Bell basis on two copies of a quantum state. This is an important concept in quantum information theory and quantum computing, especially in the context of entanglement and quantum teleportation.

The Bell basis consists of four maximally entangled two-qubit states, known as Bell states. These states are:

1. |Φ+⟩ = 1/√2 (|00⟩ + |11⟩)
2. |Φ-⟩ = 1/√2 (|00⟩ - |11⟩)
3. |Ψ+⟩ = 1/√2 (|01⟩ + |10⟩)
4. |Ψ-⟩ = 1/√2 (|01⟩ - |10⟩)

When we talk about a "two-copy Bell basis measurement", we are referring to performing a Bell basis measurement on two copies of a quantum state. Let's consider a state ρ. We have two copies of ρ, let's say ρ1 and ρ2. The two-copy Bell basis measurement would involve performing a Bell basis measurement on the combined system of ρ1 and ρ2.

This kind of measurement could be used to assess properties such as the fidelity of the state ρ, or in protocols like entanglement distillation. It is also a key part of certain quantum error correction codes, where it's used to detect and correct errors without destroying the quantum information.

**What is meant by an observable with constant spectral norm?**

An observable in quantum mechanics is represented by a Hermitian operator. **The spectral norm (or operator norm) of a Hermitian operator is equal to its largest eigenvalue in absolute value**.

So, if an observable is described as having a "constant spectral norm," it means that the maximum absolute value of its possible measurement outcomes (eigenvalues) is a constant.

For example, the Pauli matrices that represent observables in a two-level quantum system (like a qubit) have a spectral norm of 1, because their eigenvalues are +1 and -1. So, these could be described as observables with constant spectral norm.

In practice, this property could be useful because it gives a bound on the possible outcomes of measurements, which can simplify calculations or analyses. It also ensures that the observable is a bounded operator, which is an important property in functional analysis and quantum mechanics.

**what are quantum states using the fixed POVM?**

In quantum mechanics, a Positive Operator-Valued Measure (POVM) is a set of measurement operators that act on quantum states. These operators are designed such that they satisfy certain mathematical properties that make them correspond to physical measurements. Specifically, each operator in the set is positive (its eigenvalues are nonnegative), and the sum of all the operators is the identity operator.

A POVM provides a generalization of the concept of a quantum measurement. In the simplest case, quantum measurements are described by a set of projection operators that project onto different states in a quantum system. However, in more complex situations, such as when dealing with noise or imperfect detection, a more general type of measurement is needed, which is where POVMs come in.

A "fixed" POVM would refer to a specific, predetermined set of measurement operators. For instance, in a quantum computing experiment, a researcher might choose a fixed POVM to measure the output states of a quantum circuit.

As for "quantum states using the fixed POVM," it is a bit unclear without more context. However, it could mean the process of applying this predetermined set of measurement operators to certain quantum states. The result of such a measurement would be a set of probabilities (one for each operator in the POVM), indicating the likelihood of each possible measurement outcome. These probabilities would then be used to infer properties of the quantum state or the quantum system.


**what are 𝑘-body reduced density matrices?**

In quantum mechanics, the state of a system of particles can be represented by a density matrix, which is a complex, positive semi-definite matrix with a trace equal to 1.

For a system of N particles, the full density matrix lives in a high-dimensional Hilbert space (the tensor product of the individual particle's Hilbert spaces), and contains complete information about the state of the system. However, it can often be difficult to work with such high-dimensional objects.

A k-body reduced density matrix is a lower-dimensional object that contains information about only a subset of the particles in the system.

Formally, the k-body reduced density matrix is obtained by taking the partial trace over N-k of the particles in the system. This process "averages out" the information about these N-k particles, leaving behind a density matrix that describes the remaining k particles.

The k-body reduced density matrix is a useful tool in many areas of quantum physics, including quantum chemistry and quantum information theory. It allows us to study the behavior of small subsystems within a larger quantum system, even when we can't (or don't want to) keep track of the full state of all the particles. For example, the 1-body reduced density matrix (also known as the one-particle density matrix) is often used in quantum chemistry to describe electron behavior in molecular systems.

###### *CPTP Maps and Quantum Channels*

**CPTP Maps in Quantum Machine Learning**

*A quantum channel is a completely positive, trace-preserving (CPTP) map. A CPTP map is mapping one density matrix (input) to another (Output)*

*Source: Information-theoretic bounds on quantum advantage in machine learning*

*tldr: in ML experiments for an on average error quantum ML  (which operates on quantum data) cannot outperform classical ML (which operaties on classical data. qQuantum ML can outperform only for worst-case prediction error (rather than a small average prediction error), where an exponential separation becomes possible.*

We are interested in predicting functions of the form

$f(x)=\operatorname{tr}(O \mathcal{E}(|x\rangle\langle x|))$

* where $x$ is a classical input, e.g., chemicals involved in the reaction, a description of the molecule, or the intensity of lasers that con- trol the neutral atoms
* $\mathcal{E}$ is an arbitrary (possibly unknown) completely positive and trace preserving (CPTP) map - it characterizes a quantum evolution happening in the lab. Depending on the parameter x, it produces the quantum state $\mathcal{E}$(|x⟩⟨x|)
* $O$ is a ceratin known observable (what the experimentalist measures at the end of the experiment.)

This equation  encompasses any physical process that takes a classical input and produces a real number as output. The goal is to construct a function h(x) that accurately approximates f(x) after accessing the physical process $\mathcal{E}$ as few times as possible.

The goal is to predict the measurement outcome for new physical experiments, with new values of x that have not been encountered during the training process.

1. all problems that are approximately learnable by a quantum ML model are also approximately learnable by some restricted clas- sical ML model which executes the quantum process $\mathcal{E}$ a comparable number of times. This applies in particular, to predicting outputs of quantum- mechanical processes.

  * For the task of learning classical Boolean circuits, fundamental limits on quantum advantage have been established in previous work [11–13, 31, 80, 94]. Theorem 1 generalizes these existing results to the task of learning outcomes of quantum processes.

  * Therefore finding that classical and quantum ML have comparable power (for average-case prediction) boosts our hopes that the combination of classical ML and near-term quan- tum algorithms [47, 52, 53, 76, 79] may fruitfully ad- dress challenging quantum problems in physics, chem- istry, and materials science.

2. quantum ML can have an exponential ad- vantage over classical ML for certain problems where the objective is achieving a specified worst-case prediction error.





**What is the cptp map 𝑓(𝜌,𝑂) = tr(𝑂E(𝜌))?**

*Source: Learning to predict arbitrary quantum processes*

The expression you've given, 𝑓(𝜌,𝑂) = tr(𝑂E(𝜌)), doesn't directly represent a completely positive trace-preserving (CPTP) map. Rather, it looks like the representation of an expectation value in the context of quantum mechanics or quantum information theory.

In the given equation:
- 𝜌 is a density operator representing the state of a quantum system.
- 𝑂 is an observable, which is a Hermitian operator corresponding to a physical quantity that can be measured.
- E(𝜌) is some operation being performed on the state 𝜌; this could potentially represent a quantum channel or some other operation. The nature of this operation isn't specified by your question.
- tr( ) denotes the trace, which for an operator is the sum of its eigenvalues.

So, tr(𝑂E(𝜌)) is the expected value (or mean value) of the measurement of the observable 𝑂 in the state resulting from the operation E(𝜌).

A CPTP map, on the other hand, describes the evolution of quantum states when they interact with an environment. It is a map that takes a density operator to another density operator, and it has two properties:
1. It's completely positive, meaning that it preserves the positive semi-definiteness of the quantum state (ensuring that the state remains a valid physical state).
2. It's trace-preserving, which ensures that the total probability of all possible outcomes remains equal to 1.

Given this, if E were to represent a CPTP map, then tr(𝑂E(𝜌)) still wouldn't itself be a CPTP map, but instead the expectation value of an observable given a quantum state evolved by a CPTP map.

**calculate the density matrices and quantum channels of those classical operations that are represented as a quantum operation?**

Yes, you can. A classical operation represented as a quantum operation can indeed have a density matrix and a quantum channel that describes it. Let's take the classical NOT operation as an example, which can be represented as a quantum operation using the Pauli-X (or NOT) gate.

1. **Density Matrix**: The density matrix represents the state of a quantum system. If we start with a single qubit in the state |0⟩ (which can represent a classical bit in the state 0), the density matrix is given by |0⟩⟨0|, which is:

    ```
    ρ_0 = |0⟩⟨0| = [1, 0]
                      [0, 0]
    ```

    If we apply the X gate (the quantum equivalent of the classical NOT operation), the qubit goes to the state |1⟩, and its density matrix is given by |1⟩⟨1|:

    ```
    ρ_1 = Xρ_0X† = |1⟩⟨1| = [0, 0]
                                [0, 1]
    ```

    where X† is the conjugate transpose (also known as the Hermitian adjoint) of X, and it's equal to X because X is a Hermitian operator.

2. **Quantum Channel**: The quantum channel describes a quantum operation. It is a map that takes the density matrix of the input state to the density matrix of the output state. For the X gate, the map is given by:

    ```
    Λ_X(ρ) = XρX†
    ```

    where ρ is the density matrix of the input state. This map tells us how the X gate transforms any given input state.

In the above examples, we have treated a classical bit as a qubit in a definite state (either |0⟩ or |1⟩), and we have treated a classical NOT operation as a quantum X operation. This shows how classical operations and states can be embedded in the quantum framework. However, remember that this doesn't capture many of the unique features of quantum states and operations, such as superposition and entanglement.

**what is the function space in a quantum computing operation (cptp map)?**

In quantum computing, operations (also known as quantum gates or quantum channels) are often represented as Completely Positive Trace-Preserving (CPTP) maps. These maps represent the evolution of a quantum state in a quantum computation.

**A quantum state is typically described as a density matrix**, which is a positive semidefinite operator with trace 1 on a Hilbert space. A Hilbert space is a complex vector space equipped with an inner product operation, and it is a key structure in quantum mechanics as it provides the stage on which quantum states exist and evolve.

> **The "function space" in this context is the set of all possible quantum states, i.e., the set of all density matrices on the Hilbert space.**

A CPTP map is a linear, completely positive, and trace-preserving transformation on this function space. It takes an initial quantum state (a point in the function space) and produces a final quantum state (another point in the same function space), describing the evolution of the state due to the quantum operation.

In other words, **a CPTP map is a function that maps the function space of initial quantum states to the function space of final quantum states. Therefore, it is said to act on the "function space" of quantum states**.

To make this a bit more concrete, consider a single qubit. Its Hilbert space is a two-dimensional complex vector space, and the set of density matrices on this space is the set of all 2x2 positive semidefinite matrices with trace 1. A single-qubit quantum operation is a CPTP map on this set of density matrices.

CPTP maps are crucial in quantum information theory because they can represent not only idealized quantum gates but also realistic quantum operations that include effects such as decoherence and noise.


**Not all physical errors can be linear CPTP maps in a correlation space (quantum channels)**

https://www.nature.com/articles/srep00508

In the framework of quantum computational tensor network, which is a general framework of measurement-based quantum computation, the resource many-body state is represented in a tensor-network form (or a matrix-product form) and universal quantum computation is performed in a virtual linear space, which is called a correlation space, where tensors live. Since any unitary operation, state preparation and the projection measurement in the computational basis can be simulated in a correlation space, it is natural to expect that fault-tolerant quantum circuits can also be simulated in a correlation space. However, we point out that not all physical errors on physical qudits appear as linear completely-positive trace-preserving errors in a correlation space. Since the theories of fault-tolerant quantum circuits known so far assume such noises, this means that the simulation of fault-tolerant quantum circuits in a correlation space is not so straightforward for general resource states.



**What are quantum channel and Kraus operators?**

In quantum mechanics, a quantum channel is a mathematical model used to describe the evolution of quantum states due to physical processes, which could be the deterministic dynamics of an isolated quantum system, or could include the effects of noise, dissipation, and decoherence that result from interactions with an environment. Quantum channels are crucial in quantum information theory and quantum computing, where they are used to model realistic, noisy quantum operations.

A quantum channel is a completely positive, trace-preserving (CPTP) map. These properties reflect the basic requirements that any physical process must satisfy:

- **Complete positivity** ensures that the evolution is physically valid not only for individual quantum states, but also for quantum states that are part of larger systems. This is essential because quantum systems can be entangled, meaning that the state of one system can't be described independently of the state of another system.
- **Trace preservation** ensures that the total probability remains 1, reflecting the probabilistic interpretation of quantum mechanics.

One common way to represent a quantum channel is by using Kraus operators. Given a quantum channel Λ, we can find a set of operators {K_i} such that the action of the channel on any state ρ is given by

Λ(ρ) = Σ_i K_i ρ K_i†

where † denotes the conjugate transpose, and the sum is over all i. The operators K_i are called the Kraus operators, or Kraus representation, of the channel.

The Kraus operators must satisfy the completeness condition Σ_i K_i† K_i = I, where I is the identity operator, to ensure that the channel is trace-preserving.

The Kraus representation is not unique; there can be many different sets of Kraus operators that represent the same quantum channel. The specific form of the Kraus operators can depend on the physical interpretation of the channel. For example, in a quantum system subject to dissipative processes or measurement, each Kraus operator might represent a different possible outcome or error process.


**what is a quantum channel?**

A quantum channel is a mathematical construct used in quantum information theory to describe the process of transmitting quantum information from one location to another, or more generally, the evolution of quantum states in a system over time. This evolution can be due to the natural dynamics of the system or because of an interaction with an environment, which might cause phenomena like decoherence or noise.

Mathematically, a quantum channel is a completely positive, trace-preserving (CPTP) map. It takes a density matrix (which represents a quantum state) as input and returns another density matrix as output.

The "completely positive" part means that the map preserves the positive semi-definiteness of the density matrix, which is a requirement for it to represent a valid physical state. The "trace-preserving" part means that the total probability of all possible outcomes (which is represented by the trace of the density matrix) remains equal to 1.

One common example of a quantum channel is the depolarizing channel, which introduces a certain probability of error into each qubit of information transmitted through it. There are many other types of quantum channels that model different kinds of physical processes, including lossy channels, dephasing channels, amplitude damping channels, and more. Each channel has a different impact on the quantum states that pass through it, often reflecting the effects of different types of environmental noise or decoherence.

**Can you give me an example of a quantum channel?**

Certainly, a straightforward example of a quantum channel is the "bit-flip channel." This channel models the possible flipping of a qubit due to errors in a quantum computing environment.

Let's denote the basis states of our qubit as |0⟩ and |1⟩. Suppose we have a single qubit quantum state ρ which we send through the bit-flip channel. After going through the channel, each qubit can either stay the same or be flipped, i.e., |0⟩ goes to |1⟩ and |1⟩ goes to |0⟩.

Let's also say that the probability of a flip occurring is p. Then the probability that nothing happens is (1-p).

The behavior of the bit-flip channel Λ can be described as:

Λ(ρ) = (1-p)ρ + p XρX

where X is the Pauli-X gate, which flips the state of a qubit.

This description of the channel is essentially a probabilistic mixture of the identity operation (which leaves the state ρ unchanged) with weight (1-p), and the bit-flip operation (XρX) with weight p.

This is a completely positive, trace-preserving (CPTP) map: it's "completely positive" because it preserves the positive semi-definiteness of the density matrix, and it's "trace-preserving" because the sum of the probabilities of all outcomes is 1 ((1-p) + p = 1), ensuring that the total quantum state remains normalized.

As a consequence of going through this channel, quantum information can be lost or corrupted, which is a significant consideration in the design of quantum computing systems and quantum error correction codes.

**Quantum Channels and How could you formalize stochastic maps between basis changes into quantum circuits?**

Basis changes in quantum circuits are usually modeled with unitary transformations. A unitary transformation is a linear transformation that preserves the inner product in the space, and in quantum mechanics, it corresponds to a deterministic evolution of the quantum state.

However, if we're looking to introduce stochastic elements into quantum circuits, there are ways to do so, primarily through the use of quantum channels.

A quantum channel is a completely positive, trace-preserving (CPTP) map that represents a physical process in quantum mechanics. While unitary transformations correspond to ideal, noiseless operations, quantum channels can represent realistic, noisy operations. This includes both coherent operations, represented by unitary transformations, and incoherent or stochastic operations, represented by non-unitary transformations.

To introduce stochasticity, you can consider the notion of a mixed state, represented by a density matrix, which can be understood as a statistical mixture of different pure states. A quantum channel can take a pure state and produce a mixed state, representing a stochastic process.

More precisely, a quantum channel Λ can be represented by a set of Kraus operators {K_i}. These Kraus operators act on a state ρ as follows:

Λ(ρ) = Σ_i K_i ρ K_i†

where the sum is over all i, and † denotes the Hermitian adjoint (conjugate transpose). The Kraus operators are required to satisfy the completeness condition Σ_i K_i† K_i = I, where I is the identity operator.

To encode a stochastic map into a quantum circuit, you'd need to find a suitable set of Kraus operators representing the stochastic process, and incorporate them into your circuit model.

Keep in mind, however, that not all stochastic maps can be physically realized as quantum channels. The map must satisfy the conditions for a CPTP map, i.e., it must be completely positive and preserve the trace of the density matrix.

Also note that while quantum channels provide a way to model stochastic processes, they are not usually implemented directly in quantum circuits on current quantum computers, which typically only perform unitary transformations directly. However, they can be used to model the effects of noise and errors in these circuits, and can be implemented indirectly using techniques like quantum error correction and quantum teleportation.



###### *Kraus Operator, Lindblad Operator and Super-Operator*

**super-operator**

In the field of quantum mechanics and quantum information theory, a superoperator is a type of operator that acts on other operators, rather than on states. In the context of quantum mechanics, these "other operators" are usually the density matrices that represent quantum states.

This concept is needed because in quantum mechanics, the evolution of a system can often be represented as an operator acting on the state of the system. However, when we consider more complicated situations, like a quantum system interacting with an environment (leading to phenomena such as decoherence), the evolution of the system can involve operations that change the nature of the operator that represents the state itself. This kind of operation is represented by a superoperator.

Formally, a superoperator Φ acting on an operator A would be written as Φ(A), and this results in another operator.

One common example of a superoperator is the Liouville superoperator, which appears in the Liouville-von Neumann equation describing the time evolution of a quantum system. If H is the Hamiltonian of the system and ρ is the density operator, the Liouville superoperator L acting on ρ is defined as:

L(ρ) = -i[H, ρ]

where [ , ] denotes the commutator.

***Another important class of superoperators are the completely positive trace preserving (CPTP) maps***, which are used to describe the evolution of quantum systems in the presence of decoherence. A quantum operation (also called a quantum channel) is a CPTP map and can be represented as a superoperator acting on the density matrix of a quantum state.

**For what do you need a kraus operator?**

Kraus operators are a useful tool for representing and working with quantum channels, which are mathematical models that describe the evolution of quantum states due to physical processes. These processes can include both the deterministic dynamics of an isolated quantum system, and the effects of noise, dissipation, and decoherence due to interactions with an environment.

Here are some specific uses for Kraus operators:

1. **Modeling realistic quantum operations**: In quantum computing and quantum information theory, we often need to model not just ideal, noiseless operations (which are represented by unitary transformations), but also realistic, noisy operations. Kraus operators provide a way to do this. Each Kraus operator can represent a different possible outcome or error process that can occur during the operation.

2. **Calculating the effect of a quantum channel**: Given the Kraus operators for a quantum channel, we can calculate how the channel will transform any quantum state. This is useful for predicting the results of quantum computations or quantum communications in the presence of noise.

3. **Studying quantum decoherence**: Kraus operators provide a mathematical framework for studying decoherence, which is the process by which a quantum system loses its quantum properties due to interactions with its environment. Each Kraus operator can represent a different type of interaction with the environment.

4. **Quantum error correction**: Kraus operators can represent different types of errors that can occur in a quantum system. This is crucial for designing quantum error correction codes, which are schemes for detecting and correcting errors in quantum computations or communications.

5. **Quantum process tomography**: This is a technique for experimentally determining the Kraus operators of a quantum channel, which can provide insights into the physical processes occurring in a quantum system.

Overall, Kraus operators provide a flexible and powerful mathematical tool for describing and analyzing the dynamics of quantum systems, especially in the presence of noise and other non-ideal effects.


**What are Lindblad operators?**

Lindblad operators (also sometimes referred to as "jump" operators) are a fundamental component of Lindblad equations, which are used in the field of quantum mechanics to describe the evolution of quantum systems that interact with an environment. This interaction leads to effects such as decoherence and dissipation, which are not captured by the basic Schrödinger equation.

The general form of a Lindblad equation is:

dρ/dt = -i[H, ρ] + Σ_k γ_k (L_k ρ L_k† - 1/2 {L_k† L_k, ρ})

Here:

- H is the Hamiltonian of the system, which describes its energy and governs its unitary (non-dissipative) evolution.
- ρ is the density matrix of the system, which encapsulates the state of the system.
- The brackets [ , ] and { , } denote the commutator and anticommutator respectively.
- The L_k are the Lindblad operators, which describe how the system interacts with its environment.
- The γ_k are positive decay rates associated with each Lindblad operator.

Each Lindblad operator L_k represents a certain type of interaction between the quantum system and its environment. The precise form of these operators depends on the specifics of the system and its environment. For example, in a system of quantum harmonic oscillators, the Lindblad operators might be the creation and annihilation operators for each oscillator.

The non-unitary part of the Lindblad equation (the second term on the right-hand side) describes how these interactions cause the system to decay towards a steady state, and is responsible for phenomena such as relaxation and decoherence. The Lindblad equation is the most general form for the time evolution of a quantum system that ensures complete positivity and trace preservation of the density matrix, two crucial physical requirements.

**What‘s the difference between kraus operator and lindblatt operator?**

Kraus operators and Lindblad operators are both tools used to represent and study the evolution of quantum systems, especially when these systems are interacting with an environment. However, they are used in slightly different contexts and represent different types of processes.

1. **Kraus Operators**: Kraus operators are used to represent quantum channels, which are completely positive, trace-preserving (CPTP) maps that describe the evolution of quantum states. The Kraus representation is a way to express a quantum channel as a sum of terms, each of which consists of a Kraus operator acting on the quantum state. The Kraus operators can represent different types of interactions with the environment or different possible outcomes of a quantum process.

2. **Lindblad Operators**: Lindblad operators are used specifically to represent the generators of Markovian, open quantum systems in the Lindblad equation (also called the Gorini–Kossakowski–Sudarshan–Lindblad equation). The Lindblad equation describes the time evolution of a quantum system that is undergoing continuous, irreversible dynamics due to its interaction with a large, uncontrolled environment (a process called "decoherence"). Each Lindblad operator represents a different type of interaction with the environment.

So, in brief: Kraus operators are used to represent discrete transformations of quantum states in general quantum channels, while Lindblad operators are used to represent continuous, Markovian dynamics in open quantum systems.

It's also worth noting that the Lindblad equation is one way to derive Kraus operators for a quantum channel, specifically when the channel represents the evolution of a system over a small time interval under continuous, Markovian dynamics. In this case, the Lindblad operators determine the form of the Kraus operators.

**Can one say that Kraus operators are bounded, and Lindblad operators unbounded? Or is that wrong?**

In quantum mechanics, both Kraus and Lindblad operators act on a finite-dimensional Hilbert space (when we're considering finite quantum systems such as qubits, which are the building blocks of quantum computers), and hence they are indeed bounded operators. A bounded operator is an operator that doesn't amplify any vector in the Hilbert space by more than a certain factor, the operator norm.

When the quantum system is finite-dimensional, the concepts of "bounded" and "unbounded" don't really come into play, because all operators on a finite-dimensional Hilbert space are automatically bounded.

However, if you are considering infinite-dimensional systems, such as quantum fields or harmonic oscillators, then it becomes meaningful to talk about bounded and unbounded operators. In this case, Lindblad operators (the ones appearing in the Lindblad equation, governing the dynamics of open quantum systems) are typically unbounded.

This difference arises because the Lindblad equation is a differential equation, and the Lindblad operators often represent physical quantities such as position and momentum, which are unbounded operators on infinite-dimensional Hilbert spaces. On the other hand, a Kraus representation of a quantum channel represents the action of the channel on the quantum state at a specific point in time, and this action can usually be expressed in terms of bounded operators.

But to summarize, in the finite-dimensional case (which is the relevant case for most quantum computing applications), both Kraus and Lindblad operators are bounded. In the infinite-dimensional case, Kraus operators are generally still bounded, while Lindblad operators can be unbounded.


**Do you have an example of a Kraus operator and a Lindblad operator in the context of quantum computing?**

Sure, I can provide some simple examples of both Kraus and Lindblad operators in the context of quantum computing.

1. **Kraus Operators**: Kraus operators are used to describe the evolution of quantum states in the presence of noise or decoherence. They form a set of matrices {K_i} such that the sum of the Hermitian conjugates of each K_i times K_i is the identity matrix.

    For instance, let's consider a single qubit undergoing bit flip error with a certain probability p. The error can be described by two Kraus operators:

    K_0 = sqrt(1-p) * I,
    K_1 = sqrt(p) * X,

    where I is the identity operator and X is the Pauli-X operator. These operators satisfy the condition K_0†K_0 + K_1†K_1 = I, which ensures the resulting quantum state is still a valid density matrix.

2. **Lindblad Operators**: Lindblad operators (also known as jump operators) are used in the Lindblad equation, which describes the time evolution of an open quantum system.

    For instance, a common Lindblad operator for a single qubit undergoing amplitude damping (energy relaxation) is:

    L = sqrt(γ) * |0><1|,

    where |0> and |1> are the computational basis states of the qubit, and γ is the damping rate. This Lindblad operator represents the process of the qubit transitioning from the excited state |1> to the ground state |0>.

    In the context of a Lindbladian superoperator (describing the complete evolution), you might see this operator appearing in a term like L ρ L† in the master equation.

These are quite simplified examples. In practice, the Kraus and Lindblad operators for a realistic quantum system could be more complex and might depend on the specific physical details of the system.



**What is an example of unbounded operators that lead to pathological behavior in the study of infinite-dimensional spaces in functional analysis?**

Unbounded operators in infinite-dimensional spaces often require careful treatment because they can lead to "pathological" behavior, i.e., behavior that violates our usual intuitions or expectations based on finite-dimensional spaces.

One classical example is the momentum operator in quantum mechanics, which is defined on the space of square-integrable functions over the real numbers (which is an infinite-dimensional space). The momentum operator is given by -iħd/dx, where i is the imaginary unit, ħ is the reduced Planck constant, and d/dx is the derivative operator.

This operator is unbounded, meaning there's no finite number B such that the norm of the momentum operator acting on any state is less than or equal to B times the norm of the state. As a result, the momentum operator can't be defined on the entire Hilbert space, but only on a dense subset (the set of differentiable functions, in this case).

One "pathological" behavior of the momentum operator is that it doesn't have any eigenstates that belong to the Hilbert space. In quantum mechanics, we often want to find eigenstates of operators because these correspond to "observable" states of the system with definite values of the associated physical quantity. For the momentum operator, the eigenstates are plane waves, which are not square-integrable and hence do not belong to the Hilbert space.

This behavior is quite different from what we see with bounded operators, where the eigenstates always belong to the same space as the operator. In order to deal with this kind of issue, physicists often have to introduce additional mathematical structures, such as rigged Hilbert spaces, or use indirect methods, such as the spectral theorem for unbounded self-adjoint operators, to make sense of the physical meaning of unbounded operators like the momentum operator.

**sign structure in quantum operators**

The term "sign structure" is not standard terminology in quantum mechanics, so it may be interpreted in different ways depending on the context. In general, the mathematical operators used in quantum mechanics can have complex values and can act on complex-valued wavefunctions or on operators themselves, so they don't have a "sign" in the way that real numbers do.

However, there are a few contexts where the notion of sign can be important in quantum mechanics:

1. **Parity**: The parity operator is a specific quantum mechanical operator that represents spatial inversion (flipping the sign of the spatial coordinates). Eigenstates of the parity operator can have positive or negative eigenvalues, often interpreted as "even" or "odd" parity.

2. **Spin**: In quantum mechanics, particles like electrons have a property called spin, which can take on positive or negative values (typically +1/2 or -1/2 for electrons).

3. **Commutation relations**: Certain pairs of quantum operators have commutation relations that involve the imaginary unit "i", which could be interpreted as a kind of "sign" structure. For example, the position operator x and momentum operator p have the commutation relation [x, p] = iħ, where ħ is the reduced Planck's constant.

4. **Phase factors**: In quantum mechanics, the overall sign (or more generally, phase factor) of a wavefunction is not observable, which leads to the principle of quantum superposition and interference effects.

If you're asking about a specific aspect of quantum mechanics or a specific type of quantum operator, I'd be happy to provide more information!

**what is meant by the sign of a trace?**

The "sign" of a trace simply refers to whether the trace of a matrix (or operator in the context of quantum mechanics) is positive, negative, or zero.

The trace of a matrix is the sum of its diagonal elements. If this sum is greater than zero, we say that the trace (and hence the matrix or operator) is positive. If the sum is less than zero, the trace is negative. If the sum is exactly zero, the trace is zero.

This concept is often used in the context of quantum mechanics, where the trace of an operator can provide important physical information. For example, in the context of quantum density matrices, a trace of one is associated with a valid physical state.

Please note that the sign of the trace of a matrix does not necessarily indicate that all elements, or even all eigenvalues, of the matrix are of the same sign. It's possible for a matrix to have both positive and negative eigenvalues but for their sum (i.e., the trace) to be positive, negative, or zero. Similarly, it's possible for a matrix to have positive and negative elements but for their diagonal sum (i.e., the trace) to be positive, negative, or zero.

###### *Quantum Annealer and Adabiatic Quantum Computing*

* Quantum annealing is a quantum computing algorithm that can be used to solve a variety of problems, including NP-hard problems.
* However, the efficiency of quantum annealing is limited by the **exponential closing of the spectral gap with increasing system size.**
* The spectral gap is a measure of the connectivity of a quantum system. **A large spectral gap means that the system is well-connected, while a small spectral gap means that the system is poorly connected**. The exponential closing of the spectral gap means that the spectral gap decreases exponentially with increasing system size.
* This means that quantum annealing becomes exponentially slow for solving NP-hard problems as the system size increases. This is because **the quantum system becomes more and more poorly connected as the system size increases, and it takes longer and longer for the system to reach its ground state**.
* There are a number of ways to address the problem of exponential closing of the spectral gap. One approach is to use a technique called adiabatic quantum computation. Adiabatic quantum computation is a variation of quantum annealing that uses a slower annealing schedule. This can help to prevent the spectral gap from closing too quickly.
* Another approach to addressing the problem of exponential closing of the spectral gap is to use a technique called quantum error correction. Quantum error correction is a technique that can be used to protect quantum systems from noise. This can help to prevent the quantum system from becoming too poorly connected as the system size increases.
* The problem of exponential closing of the spectral gap is a major challenge for quantum annealing. However, there are a number of promising approaches to addressing this problem. As quantum annealing technology continues to develop, it is possible that quantum annealing will become a viable solution for solving NP-hard problems.


**Connectivity and Solution Finding in Quantum Annealing**

Quantum annealing is a quantum computing technique used to find the minimum value of a given objective function over a defined set of possible solutions. It draws inspiration from classical annealing, a process in which a system is gradually cooled to find its lowest energy state. The hope in quantum annealing is to leverage quantum properties, such as superposition and tunneling, to more efficiently explore the solution space and find the ground state, which represents the minimum of the objective function.

Connectivity plays an essential role in this process. Let's understand this with a few key points:

1. **Local Minima and Global Minima**: In complex optimization problems, there are often many local minima. Finding the global minimum (or a sufficiently good approximation of it) is the main challenge. High connectivity means that the system can more easily explore the solution space, jumping from one potential solution to another and avoiding getting trapped in local minima.

2. **Quantum Tunneling**: One of the primary advantages of quantum annealing over classical annealing is quantum tunneling. Tunneling allows the system to "pass through" barriers between states, meaning that the system can more easily escape local minima and move towards the global minimum. If the quantum system's connectivity is low, the benefit from quantum tunneling can be diminished.

3. **Effective Exploration of Solution Space**: In a highly connected system, any given qubit (quantum bit) is more likely to interact with many other qubits. <font color="blue">This allows for a richer exploration of the solution space. With fewer connections, the qubits are limited in their interactions, making the exploration less effective.</font>

4. **Scaling Issues**: <font color="blue">as the system size increases, maintaining high connectivity becomes more challenging. The difficulty in keeping high connectivity in larger systems is one reason why quantum annealing's efficiency decreases for more complex NP-hard problems</font>. The sparser the connectivity, the less able the system is to effectively leverage quantum effects.

5. **Error Correction and Noise**: Highly connected quantum systems can be more susceptible to certain types of errors or noise. So, while high connectivity can aid in the exploration of solution space, it can also introduce challenges in terms of maintaining the coherence and reliability of the quantum information.

In summary, high connectivity in quantum annealing is crucial for effectively exploring the solution space, leveraging quantum effects like tunneling, and ensuring the system doesn't get trapped in local minima. As system sizes increase, maintaining this connectivity becomes more challenging, which contributes to the reduced efficiency of quantum annealing for large NP-hard problems.

> *Does spectral gap stand in quantum annealing for difference between global and local optimum, or graph connectivity?* For graph connectivity it would be bad to have a small spectral gap, meanhwile for optimum it would be potentially good.

*Spectral gap stands for graph connectivity. if that is low, then there are fewer connections in the graph, and that results that it's harder to find the global optimum*

*Around the same time, it was argued that QA is exponentially slow for solving NP-hard problems due to exponential closing of the spectral gap with increasing system size*
https://arxiv.org/pdf/2212.13649.pdf

[20]  Jorg, T., Krzakala, F., Kurchan, J. & Maggs, A. Simple glass models and their quantum annealing. Physical review letters 101, 147204 (2008).
[21]  Altshuler, B., Krovi, H. & Roland, J. Adiabatic quantum optimization fails for random in- stances of np-complete problems. arXiv preprint arXiv:0908.2782 (2009).

*exponential closing of the quantum gap with increasing problem size* https://www.nature.com/articles/s41467-018-05239-9

*The efficiency of Adiabatic Quantum Annealing is limited by the scaling with system size of the minimum gap that appears between the ground and first excited state in the annealing energy spectrum. In general the algorithm is unable to find the solution to an optimisation problem in polynomial time due to the presence of avoided level crossings at which the gap size closes exponentially with system size.* https://arxiv.org/pdf/2203.06779.pdf
  * A particular problem which has been highlighted is the potential for so called perturbative crossings to form between the low energy eigenstates of the problem Hamiltonian towards the end of the anneal [11, 12]. The corresponding energy gap has been found to be exponentially small with the Hamming distance between the eigenstates, which can generally be expected to grow with the system size [11].

**In quantum annealing, an exponential spectral gap generally presents a challenge** = energy difference between ground state and first excited state is very small. This small gap can make it harder for the quantum annealer to distinguish between the two states, especially when there's noise in the system. "However, efficiency of quantum annealing is limited by the exponential closing of the spectral gap with increasing system size." Problems:
  * In order to stay close to ground state throughout annealing process, evolution parameter \( s \) or \( t \) needs to change very slowly. This can lead to very long quenching times, making annealing process computationally inefficient.
  * Thermal Excitations: thermal fluctuations can easily push system out of ground state because energy difference is small. This makes finding the correct solution (the ground state) less likely.
  * Small spectral gap implies sensitivity to external noise. Small gap can cause system to be easily perturbed away from desired evolution.
  * Techniques and strategies, like optimized annealing schedules or error-correction methods, may be needed to cope with these challenges.
* An exponential spectral gap might mean that the first excited state is close to ground state in terms of energy, the implications for solution quality can vary. Depending on the problem, this could mean you have a nearly optimal solution, or it could mean you're still far from the best possible configuration.
  1. **Nature of the Problem**: The importance of the energy difference can vary depending on the nature of the problem you're trying to solve. In some optimization problems, even a small energy difference might correspond to a significant difference in solution quality. In others, the difference might be negligible in practical terms.
  2. **Many-body Systems**: Quantum annealing is often applied to complex many-body systems. In these systems, the difference in energy between the ground state and the first excited state might correspond to a complex reconfiguration of many particles or spins. So, the "distance" between these states in terms of system configuration might be significant even if their energy difference is small.
  3. **Landscapes with Many Local Minima**: In complex optimization problems, there can be many local minima (sub-optimal solutions that are better than their immediate neighbors). The presence of an exponential spectral gap might indicate that the system is easily getting trapped in one of these local minima. While the first excited state might be close in energy to the ground state, there might be other states with much higher energies that are also close in terms of the system's configuration.
  4. **Goal of Quantum Annealing**: The primary goal of quantum annealing is to find the global minimum (or a very close approximation to it). So, while the first excited state might offer a "good" solution, the aim is typically to find the "best" solution, especially in problems where small differences can have amplified outcomes.



[Scott Aaronson - Dismantling quantum hype](https://youtu.be/qs0D9sdbKPU)

https://medium.com/@quantum_wa/quantum-annealing-cdb129e96601

https://en.m.wikipedia.org/wiki/Annealing_(materials_science)

Kerr and ising gate: non linear gates

* The Kerr gate is a gate that introduces nonlinearity into a quantum system. The Kerr gate can be used to implement a variety of quantum operations, such as quantum logic gates and quantum simulations.

* The quartic gate is a gate that introduces even higher-order nonlinearity into a quantum system. The quartic gate can be used to implement a variety of quantum operations, such as quantum error correction and quantum machine learning.

* The cubic gate is a gate that introduces cubic nonlinearity into a quantum system. The cubic gate can be used to implement a variety of quantum operations, such as quantum cryptography and quantum teleportation.

**what is the Kerr ising model?**


The Kerr Ising model (KIM) is a quantum many-body model that combines the features of the Ising model and the Kerr nonlinearity. The Ising model is a classical model of interacting spins, while the Kerr nonlinearity is a quantum effect that describes the interaction of light with matter.

The KIM is described by the following Hamiltonian:

> $H = -\sum_{i,j} J_{ij} \sigma^z_i \sigma^z_j - \sum_i \frac{\alpha}{2} \sigma^x_i^2$

where $\sigma^z_i$ and $\sigma^x_i$ are Pauli matrices that represent the spin of the $i$th qubit, $J_{ij}$ is the Ising coupling strength between the $i$th and $j$th qubits, and $\alpha$ is the Kerr nonlinearity coefficient.

The KIM has been studied in a variety of contexts, including quantum annealing, quantum simulation, and quantum information processing. It has been shown that the KIM can be used to solve certain NP-complete problems, and it can also be used to simulate the dynamics of other quantum many-body systems.

Here are some of the key features of the Kerr Ising model:

* It is a quantum many-body model that combines the features of the Ising model and the Kerr nonlinearity.
* It has been shown to be useful for quantum annealing, quantum simulation, and quantum information processing.
* It has been studied in a variety of contexts, including theoretical studies and experimental implementations.

The Kerr Ising model is a promising platform for a variety of quantum applications. As our understanding of the model continues to grow, we can expect to see even more exciting applications in the future.

https://iopscience.iop.org/article/10.1088/1367-2630/ab7255

**what is the difference between adiabatic quantum computing and quantum annealing?**

Adiabatic quantum computing (AQC) and quantum annealing (QA) are both approaches to quantum computing that use a time-dependent Hamiltonian to solve problems. However, there are some key differences between the two approaches.

In AQC, the Hamiltonian is slowly varied from an initial Hamiltonian that is easy to prepare to a final Hamiltonian that encodes the solution to the problem. The system is then allowed to evolve adiabatically, meaning that it remains in its ground state throughout the evolution. This ensures that the system will end up in the ground state of the final Hamiltonian, which is the solution to the problem.

In QA, the Hamiltonian is also slowly varied from an initial Hamiltonian to a final Hamiltonian. However, the final Hamiltonian does not necessarily encode the solution to the problem. Instead, the goal of QA is to find the ground state of the final Hamiltonian, which is a global minimum of the energy landscape. This can be useful for solving optimization problems, where the goal is to find the minimum value of a function.

Another difference between AQC and QA is that AQC is a universal model of quantum computation, while QA is not. This means that AQC can be used to solve any problem that can be solved by a classical computer, while QA is only able to solve optimization problems.

In practice, QA is often used to solve problems that are difficult for classical computers to solve. This is because the adiabatic evolution can help the system to tunnel through energy barriers that would be insurmountable for a classical computer. However, there is still some debate about whether QA is truly a quantum-mechanical phenomenon, or whether it can be explained by classical physics.

Here is a table that summarizes the key differences between AQC and QA:

| Feature | Adiabatic Quantum Computing | Quantum Annealing |
|---|---|---|
| Hamiltonian | Varied slowly from an easy-to-prepare initial Hamiltonian to a final Hamiltonian that encodes the solution to the problem. | Varied slowly from an easy-to-prepare initial Hamiltonian to a final Hamiltonian that is a global minimum of the energy landscape. |
| Universality | Yes | No |
| Applications | Any problem that can be solved by a classical computer | Optimization problems |
| Quantum-mechanical phenomenon | Yes | Possibly |


**Imaginary time evolution**


https://physics.stackexchange.com/questions/557225/why-do-we-use-the-imaginary-time-evolution-in-simulations-of-some-quantum-system

Imaginary time is a concept derived from quantum mechanics and statistical mechanics. It introduces the idea of replacing "real time" with "imaginary time" by a Wick rotation in the complex plane. That is, the time variable 't' is replaced with an imaginary number 'it', where 'i' is the imaginary unit.

Imaginary time evolution plays a significant role in several areas of physics, including quantum field theory, statistical mechanics, and quantum computing.

1. **Quantum Field Theory and Statistical Mechanics**: In these fields, the use of imaginary time is often a mathematical trick that simplifies calculations. By transforming to imaginary time, the calculations of quantum mechanics often become calculations in statistical mechanics. This is utilized in the technique called "path integral formulation," where the evolution of a system in imaginary time makes the system go to its lowest energy state or the ground state.

2. **Quantum Computing**: In quantum computing, the idea of imaginary time evolution can be used to design quantum algorithms for tasks such as finding the ground state of a system. This is used in the Quantum Approximate Optimization Algorithm (QAOA) and the Quantum Imaginary Time Evolution (QITE) algorithm.

Remember that "imaginary time" is not about time in the sense that we experience it. Instead, it's a mathematical construct that physicists use to solve certain types of problems.

Imaginary time is an unphysical, yet powerful, mathematical concept. It has been utilised in numerous physical domains, including quantum mechanics, statistical mechanics and cosmology. Often referred to as performing a ‘Wick rotation’

https://www.nature.com/articles/s41534-019-0187-2

A Quantum Adiabatic Evolution Algorithm Applied to Random Instances of an NP-Complete Problem

https://arxiv.org/abs/quant-ph/0104129

###### *Dicke States (with Garbage)*

Dicke states are a class of quantum states that are defined as the equal-amplitude superposition of all n-qubit computational basis states with Hamming weight k. In other words, a Dicke state |D_n^k⟩ is a quantum state of n qubits where exactly k of the qubits are in the |1⟩ state and the remaining (n-k) qubits are in the |0⟩ state.

Dicke states are important for quantum computing for a number of reasons. First, they are a type of entangled state, which means that the qubits in a Dicke state are correlated with each other in a way that cannot be explained by classical physics. This entanglement can be exploited to perform quantum algorithms that are much faster than any possible classical algorithm.

Second, Dicke states are relatively easy to prepare on a quantum computer. This is because there are a number of efficient quantum circuits that can be used to create Dicke states. This makes them a good candidate for use in practical quantum computing applications.

Third, Dicke states have a number of other applications in quantum information science. For example, they can be used to create quantum memories, quantum communication channels, and quantum simulators.

Here are some specific examples of the applications of Dicke states in quantum computing:

* **Quantum algorithms:** Dicke states can be used to speed up a variety of quantum algorithms, such as the Shor's algorithm for factoring large numbers and the Grover's algorithm for searching an unsorted database.
* **Quantum communication:** Dicke states can be used to create quantum communication channels that are more secure than classical communication channels.
* **Quantum simulation:** Dicke states can be used to simulate the behavior of physical systems, such as molecules and materials.

Overall, Dicke states are an important class of quantum states with a wide range of applications in quantum computing. They are relatively easy to prepare on a quantum computer and can be used to speed up a variety of quantum algorithms and create secure quantum communication channels.

Dicke states with garbage are a type of quantum state that is used in quantum computing to protect against noise and errors. They are created by adding a number of "garbage" qubits to a Dicke state. The garbage qubits are initialized in a random state, and they do not contribute to the computation. However, they help to protect the useful qubits from noise and errors.

Here is an example of how Dicke states with garbage can be used to protect against noise and errors. Suppose we want to create a Dicke state of n qubits with k qubits in the |1⟩ state. We can do this by first initializing n qubits in the |0⟩ state. Then, we can add m garbage qubits to the system, where m is much larger than k. We can then initialize the garbage qubits in a random state. Finally, we can apply a Hadamard gate to each of the qubits.

The resulting state will be a Dicke state with garbage. The useful qubits will be in the |1⟩ state with high probability, even if the noise and errors are very high. This is because the garbage qubits will absorb the noise and errors, and they will not affect the useful qubits.

Dicke states with garbage are a powerful tool for protecting against noise and errors in quantum computing. They are relatively easy to implement, and they can be used to protect a wide range of quantum algorithms.

###### *Stabilizer Circuits (Clifford Group & Pauli Gate)*

https://en.m.wikipedia.org/wiki/Stabilizer_code

https://en.m.wikipedia.org/wiki/Five-qubit_error_correcting_code

Video: [Linear Growth of Quantum Circuit Complexity](https://www.youtube.com/watch?v=McfxzoMw1Ig)

Video: [Estimating the Fault Tolerant Cost of Classically Intractable Quantum Computations](https://www.youtube.com/watch?v=H1UdcAdw5kg)

Video: [Crypto and Privacy Village 2021 Day 1](https://www.youtube.com/watch?v=8dcl9GTPebU)

**Why is the Pauli z operator a stabilizer for state |0>?**

The Pauli-Z operator \( Z \) is given by:

$
Z = \begin{pmatrix}
1 & 0 \\
0 & -1 \\
\end{pmatrix}
$

A stabilizer for a quantum state \( |\psi\rangle \) is an operator \( S \) such that:

$
S |\psi\rangle = |\psi\rangle
$

That is, applying the stabilizer leaves the state unchanged.

Now, let's see if the Pauli-Z operator is a stabilizer for the state \( |0\rangle \). The state \( |0\rangle \) is represented as:

$
|0\rangle = \begin{pmatrix}
1 \\
0 \\
\end{pmatrix}
$

Applying the Pauli-Z operator to this state:

$
Z |0\rangle = \begin{pmatrix}
1 & 0 \\
0 & -1 \\
\end{pmatrix}
\begin{pmatrix}
1 \\
0 \\
\end{pmatrix}
= \begin{pmatrix}
1 \\
0 \\
\end{pmatrix}
= |0\rangle
$

Indeed, $Z |0\rangle = |0\rangle$, so the Pauli-Z operator is a stabilizer for the state $|0\rangle$.

The identity operator $I$ is also a stabilizer!

**Stabilizer Circuits:**  *Generate stabilizer states (don't change eigenvalues of Pauli matrices), but can be efficienly simulated classically + are not universal. Sometimes are transversal (=automatically fault-tolerant).*

  * **Group Generators and Stabilizer States**:

    * [Clifford group](https://en.m.wikipedia.org/wiki/Clifford_gates) consists of a set of n-qubit operations generated by three gates: CNOT, H and S. [Construct a single qubit gate Pauli gates X, Y, Z from S, H and Pauli gates](https://quantumcomputing.stackexchange.com/questions/29035/constructing-a-single-qubit-gate-from-s-h-and-pauli-gates)

    * **The Clifford Group is the group of operators which maps the Pauli Group {I, X, Y, Z} onto itself**

      * The Pauli matrices are [involutory](https://en.m.wikipedia.org/wiki/Involutory_matrix) (a square matrix that is its own inverse): I^2 = X^2 = Y^2 = Z^2 = -i X Y Z = I

      * The Pauli matrices also [anti-commute](https://en.m.wikipedia.org/wiki/Anticommutative_property), for example Z X = i Y = -X Z

      * The [rotation operators](https://en.m.wikipedia.org/wiki/List_of_quantum_logic_gates#Rotation_operator_gates) Rx(θ), Ry(θ), Rz(θ), the [phase shift gate](https://en.m.wikipedia.org/wiki/Quantum_logic_gate#Phase_shift_gates) P(φ) and [CNOT](https://en.m.wikipedia.org/wiki/Quantum_logic_gate#CNOT) form a universal set of quantum gates [Source](https://en.m.wikipedia.org/wiki/Quantum_logic_gate) (due to phase shift it is universal)

    * Stabilizer states: Clifford gates "preserve" Pauli group: when you apply a Clifford gate to a Pauli operator, the result is another Pauli operator. Clifford group generates [stabilizer states](https://en.m.wikipedia.org/wiki/Stabilizer_code) or stabilizer circuits = circuits that only consist of gates from the normalizer of the qubit Pauli group = CNOT, Hadamard, and phase gate.

    * Clifford gates can be efficiently implemented in many physical systems = they are NOT expensive (like T gate) = based on surface code

    * Clifford-group elements take stabilizer groups to stabilizer groups and thus stabilizer states to stabilizer states. A Clifford unitary can be multiplied by a phase without changing how it conjugates operators. http://info.phys.unm.edu/~caves/reports/stabilizer.pdf. https://arxiv.org/pdf/1603.03999.pdf. Paper: [On Clifford groups in quantum computing](https://arxiv.org/abs/1810.10259)
  
  * **Bad thing**: Stabilizer circuits can be efficiently and perfectly simulated classically according to the [Gottesman–Knill theorem](https://en.m.wikipedia.org/wiki/Gottesman%E2%80%93Knill_theorem) (in polynomial time on a probabilistic classical computer).

    1. Preparation of qubits in computational basis states,
    2. Clifford gates (Hadamard gates, controlled NOT gates, phase gate S ), and
    3. Measurements in the computational basis.

  * **Solution**: Because there are quantum operations that don't preserve the Pauli group, we need non-Clifford gates to achieve univeesal quantum computing like tha T-gate). [Magic States](https://en.m.wikipedia.org/wiki/Magic_state_distillation) can help to implement them efficiently. <font color="red"> Note: why can't you simulate a T-gate in polynomial time?

    * Universal Quantum Gates = Any quantum operation can be constructed to any desired level of precision.

    * You need: Clifford gate (H, S, CNOT) +  Non Clifford gate (T). Alternatively: Toffoli gate + H. Alternatively: Single-qubit rotation gates (Pauli gates) + CNOT gate + Phase shift (incl. Z, S, T gate).

    * Classically: One can show that the AND, NOT and OR gates form a universal set which means that any logical operation on a arbitrary number of qubits can be expressed in terms of these gates. Thus, classically, we only need 1-bit and 2-bit gates to perform any operation. https://young.physics.ucsc.edu/150/gates.pdf

**Synthesizing**
* a rotation (with a T gate): find interatively a sequence of T gates that can be used to implement a desired rotation with minimum number of T gates for a given rotation.
* With T-count algorithm in polynomial-time to optimize circuit and make it mofe efficient + fault-tolerant. Example, the rotation that takes the state |0⟩ to the state |+⟩ can be synthesized with a single T gate.
* https://www.nature.com/articles/s41534-022-00651-y


*Pauli Gates*

https://medium.com/quantum-untangled/visualizing-quantum-logic-gates-part-1-515bb7b58916

Nice visualisations: https://medium.com/analytics-vidhya/quantum-gates-7fe83817b684

https://pme.uchicago.edu/awschalom-group/all-optical-holonomic-single-qubit-gates


**Pauli-X Gate (Flip Computational States)**

> $X=\left[\begin{array}{ll}0 & 1 \\ 1 & 0\end{array}\right]=\sigma_{x}=\mathrm{NOT}$

* The Pauli- $X$ gate is the quantum equivalent of the [NOT gate](https://en.m.wikipedia.org/wiki/Inverter_(logic_gate)) for classical computers (its main function is to invert the input signal applied) with respect to the standard basis $|0\rangle,|1\rangle$, which distinguishes the $z$ axis on the Bloch sphere. It is sometimes called a bit-flip as it maps $|0\rangle$ to $|1\rangle$ and $|1\rangle$ to $|0\rangle .$

* The Pauli gates $(X, Y, Z)$ are the three Pauli matrices $\left(\sigma_{x}, \sigma_{y}, \sigma_{z}\right)$ and act on a single qubit. The Pauli $X_{1} Y$ and $Z$ equate, respectively, to a rotation around the $x, y$ and $z$ axes of the Bloch sphere by $\pi$ radians.

* Nice visualisations: https://medium.com/analytics-vidhya/quantum-gates-7fe83817b684



**Square Root of Pauli-X Gate**

The square root of NOT gate (or square root of Pauli- $X, \sqrt{X}$ ) acts on a single qubit. It maps the basis state $|0\rangle$ to $\frac{(1+i)|0\rangle+(1-i)|1\rangle}{2}$ and $|1\rangle$ to $\frac{(1-i)|0\rangle+(1+i)|1\rangle}{2} .$

In matrix form it is given by

>$
\sqrt{X}=\sqrt{\mathrm{NOT}}=\frac{1}{2}\left[\begin{array}{cc}
1+i & 1-i \\
1-i & 1+i
\end{array}\right]=\frac{1}{\sqrt{2}}\left[\begin{array}{cc}
e^{i \pi / 4} & e^{-i \pi / 4} \\
e^{-i \pi / 4} & e^{i \pi / 4}
\end{array}\right]$

such that

>$
(\sqrt{X})^{2}=(\sqrt{\mathrm{NOT}})^{2}=\left[\begin{array}{ll}
0 & 1 \\
1 & 0
\end{array}\right]=X
$

> *This operation represents a rotation of $\pi / 2$ about $x$ -axis at the Bloch sphere*.

**Pauli-Y Gate (Phase Flip between i and -i)**

> $Y=\sigma_{y}=\left[\begin{array}{cc}0 & -i \\ i & 0\end{array}\right]$

* Similarly, the Pauli- $Y$ maps $|0\rangle$ to $i|1\rangle$ and $|1\rangle$ to $-i|0\rangle$ (NOT gate with i-multiple):


**Pauli-Z Gate ($\pi$ Flip Phase between +1 and -1)**

> $Z=\left[\begin{array}{cc}1 & 0 \\ 0 & -1\end{array}\right]=\sigma_{z}\quad Z=|0\rangle\langle 0|-| 1\rangle\langle 1|$

* Pauli Z gate is a phase flip gate that causes rotation around the z-axis by π radians. Z flippt zwischen den Polen der X Achse (+ und -).

* *Z Gate flippt zwischen den Hadamard gegenüberliegenden Richtungen auf der X-Achse. We change the phase (or sign) on a Qubit*:

> $\mathbf{Z}|0\rangle=|0\rangle$

> $\mathbf{Z}|1\rangle=-|1\rangle$

* Since |0⟩ and |1⟩ lie on the z-axis, the Z-gate will not affect these states. To put it in other terms *|0⟩ and |1⟩ are the two eigenstates of the Z-gate*. On the other hand, it flips |+⟩ to |-⟩ and |-⟩ to |+⟩.

* Pauli $Z$ leaves the basis state $|0\rangle$ unchanged and maps $|1\rangle$ to $-|1\rangle$. Due to this nature, it is sometimes called *phase-flip* (flips sign of second entangled state)

* *The Z gate is like the X gate but in the hadamard basis, flipping states*

**Pauli-S Gate: $\frac{\pi}{2}$ Flip Phase between Z and Y (with $\mu$=i and $\nu$=-i)**

> ${S}=\left[\begin{array}{cc}1 & 0 \\ 0 & i\end{array}\right]=\left[\begin{array}{cc}1 & 0 \\ 0 & \sqrt{-1}\end{array}\right]=\sqrt{\mathbf{Z}}$

* The phase gate $\mathbf{Z}$ transforms $|+\rangle$ to $|-\rangle$, and we can find the gate that does "half of" this transformation by finding the square root of the matrix $\mathbf{Z}$

* fintuning Z gate in der Hadamard basis: S-gate is die Hälfte vom Z-Gate, und R-Gate ein viertel vom Z-Gate

* Use the matrix $\mathbf{S}$ to find the state $|\mu\rangle$ which is "halfway" between $|+\rangle$ and $|-\rangle$ :

> $\mathbf{S}|+\rangle=|\mu\rangle =\frac{1}{\sqrt{2}}(|0\rangle+i|1\rangle)$

If you begin with $|+\rangle$ and apply the $\mathbf{S}$ gate three times in a row, you find a new state, $|\nu\rangle$ which appears to mirror the complex state $|\mu\rangle$ :

> $|\mu\rangle=\frac{1}{\sqrt{2}}(|0\rangle+i|1\rangle)$

> $|\nu\rangle=\frac{1}{\sqrt{2}}(|0\rangle-i|1\rangle) .$

We've already uncovered the gates we need to perform any rotations around the $x$ - and $z$-axes, which can bring us from an initialized qubit to any other quantum state.

Unfortunately, this result comes with a pretty serious caveat: all of the gates we have used so far are discrete. They perform rotations around the Bloch sphere, but since they rotate in discrete hops (of $\pi / 2$ or $\pi$ ), they are unable to reach most of the intermediate states on the surface of the sphere.

One solution to this problem is to define smaller and smaller rotations around these axes. The gate $\mathbf{S}$ is equal to $\sqrt{\mathbf{Z}}$ and halves its rotation angle from $\pi$ to $\pi / 2$. Even greater division of these gates is possible, and some of them have
common names:

$\mathbf{Z}=\mathbf{Z}=\left[\begin{array}{cc}1 & 0 \\ 0 & -1\end{array}\right]$ Rotation: $\pi$

$\mathbf{S}=\sqrt[2]{\mathbf{Z}}=\left[\begin{array}{cc}1 & 0 \\ 0 & i\end{array}\right]$ Rotation: $\pi / 2$


$\mathbf{T}=\sqrt[4]{\mathbf{Z}}=\left[\begin{array}{cc}1 & 0 \\ 0 & e^{i \pi / 4}\end{array}\right] \quad$ Rotation: $\pi / 4$ not clifford

$\mathbf{R 8}=\sqrt[8]{\mathbf{Z}}=\left[\begin{array}{cc}1 & 0 \\ 0 & e^{i \pi / 8}\end{array}\right] \quad$ Rotation: $\pi / 8 .$ not clifford

**Swap Gate**

*2 Qubits - Swap Gate (& Swap Square root): Swap two Qubits (like in Quantum Fourier Transform)*

The swap gate **swaps two qubits**.

With respect to the basis $|00\rangle,|01\rangle,|10\rangle,|11\rangle$, it is represented by the matrix:

>$
\text { SWAP }=\left[\begin{array}{llll}
1 & 0 & 0 & 0 \\
0 & 0 & 1 & 0 \\
0 & 1 & 0 & 0 \\
0 & 0 & 0 & 1
\end{array}\right]
$

**Square root of swap gate**

The $\sqrt{\text { SWAP }}$ gate performs half-way of a two-qubit swap.

It is universal such that any many-qubit gate can be constructed from only $\sqrt{\text { SWAP }}$ and single qubit gates.

The $\sqrt{\text { SWAP }}$ gate is not, however maximally entangling; more than one application of it is required to produce a Bell state from product states.

With respect to the basis $|00\rangle,|01\rangle,|10\rangle,|11\rangle$, it is represented by the matrix:

>$
\sqrt{\mathrm{SWAP}}=\left[\begin{array}{cccc}
1 & 0 & 0 & 0 \\
0 & \frac{1}{2}(1+i) & \frac{1}{2}(1-i) & 0 \\
0 & \frac{1}{2}(1-i) & \frac{1}{2}(1+i) & 0 \\
0 & 0 & 0 & 1
\end{array}\right]
$

This gate arises naturally in systems that exploit exchange interaction.

###### *Transversal Gate (Eastin–Knill theorem)*

**Transversality**

* **Transversal gates**: transversal is "the logical operation is achieved by broadcasting the same operation over the physical qubits" [Source](https://quantumcomputing.stackexchange.com/questions/24269/what-is-formally-a-transversal-operator). Are automatically fault-tolerant.

* Non-Clifford gates (like T-gate) can NOT be implemented transversally in a fault-tolerant manner in any quantum error-correcting code, see [Eastin-Knill theorem](https://en.m.wikipedia.org/wiki/Eastin%E2%80%93Knill_theorem). Because Set of transversal gates is finite group. **A finite group - unlike a finite set - cannot approximate an infinite set of all unitaries arbitrarily well**. [Source](https://quantumcomputing.stackexchange.com/questions/28057/how-to-intuitively-understand-why-t-gate-cant-be-implemented-transversally). So you need at least one non transversal / non-Clifford gate for universality.
  
* non-transversal gates have higher depth (dominant contributer to fault-tolerant threshold).  Goal: Simplify these implementations. [Source](https://quantumcomputing.stackexchange.com/questions/21580/why-do-we-care-about-the-number-of-t-gates-in-a-quantum-circuit) with magic state distillation.

* [Transversal logical gate for Stabilizer (or at least Steane code)](https://quantumcomputing.stackexchange.com/questions/15301/transversal-logical-gate-for-stabilizer-or-at-least-steane-code) and [errorcorrectionzoo](https://errorcorrectionzoo.org/list/quantum_transversal)

![geometry](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_1639.png)

https://arxiv.org/abs/0811.4262

https://en.m.wikipedia.org/wiki/Eastin%E2%80%93Knill_theorem

###### *Magic State Distillation*

**Distillation & Magic States** *Some non-Clifford gates are magic states, allow for universal quantum computing*
  
  * Magic states are specific quantum states that can be used to implement non-Clifford gates (like T-gate) in a fault-tolerant way. A non-Clifford gate, such as the T-gate. All magic states are non-Clifford gates, but not all non-Clifford gates are magic states.

  * [Magic States](https://en.m.wikipedia.org/wiki/Magic_state_distillation) are non-Clifford gates that can be used to efficiently implement Clifford gates and perform any quantum algorithm. They are not stabilizer states (not invariant under the action of the Pauli matrices = resulting state will have different eigenvalues) and have very nice properties with respect to fault-tolerant quantum computation. <font color="red"> Note: what are these properties?</font>  [what-are-magic-states](https://quantumcomputing.stackexchange.com/questions/13629/what-are-magic-states)
  
  * Magic States are a key component of the ["magic state **distillation**"](https://en.m.wikipedia.org/wiki/Magic_state_distillation) protocol, a technique used in quantum error correction. Then, using a procedure called **gate injection or state injection**, these magic states can be used to effectively implement non-Clifford gates in a fault-tolerant manner. This is done by preparing the magic state, performing some Clifford operations and measurements (which can be done fault-tolerantly), and using the outcomes to adaptively apply further Clifford operations.
    
  * **Distillation**: This protocol involves preparing many noisy copies of a magic state, then performing a series of operations to "distill" these into fewer, higher-fidelity magic states. Distillation: purifying a quantum state by combining multiple copies of the state. Goal is to produce a state with high fidelity (close to desired state, e.g. closer to ideal T gate than the input states). T gate can be distilled efficiently (relative to alternatives) because it is a **magic state** = there are a number of different distillation protocols that have high yields (high efficiency).
      
  * https://en.m.wikipedia.org/wiki/Five-qubit_error_correcting_code

  * https://www.nature.com/articles/s41534-022-00666-5

**quantum gate injection or magic state injection**

https://quantumcomputing.stackexchange.com/questions/28481/is-this-kind-of-t-injection-correct

**lattice surgery**: Lattice surgery is a method to perform quantum computation fault-tolerantly by using operations on boundary qubits between different patches of the planar code.


**It's a kind of magic.**

Quantum states can be much less magical than they computationally appear to be.

*A little magic means a lot*

https://arxiv.org/abs/2308.16228

Notions of so-called #magic quantify how non-classical #quantumstates are in a precise sense: high values of magic are necessary to demonstrate a quantum advantage in #quantumcomputing; they also play a key role in #quantumerrorcorrection.

In this work, we introduce the phenomenon of `#pseudomagic' - wherein certain ensembles of quantum states with low magic are computationally indistinguishable from quantum states with high magic. Previously, such computational indistinguishability has been studied with respect to entanglement, by introducing the notion of #pseudoentanglement. However, we show that pseudomagic neither follows from pseudoentanglement, nor implies it.

In terms of applications, pseudomagic sheds new light on the theory of #quantumchaos: it reveals the existence of states that, although built from non-chaotic unitaries, cannot be distinguished from #randomchaoticstates by any physical observer. Further applications include new lower bounds on #statesynthesis problems, #propertytesting protocols, as well as implications for #quantumcryptography.

Our results have the conceptual implication that magic is a `hide-able' property of quantum states: some states have a lot more magic than meets the (computationally-bounded) eye. From the physics perspective, it advocates the mindset that the only physical properties that can be measured in a laboratory are those that are efficiently computationally detectable.

**can you give me an example of magic state distillation in quantum computing?**

Certainly! Magic state distillation is a protocol used to produce fault-tolerant, high-fidelity "magic states" from noisy versions of those states. These magic states can be used to perform non-Clifford gates, which are essential for universal quantum computation but are generally not directly implementable in a fault-tolerant manner.

The core idea is to take multiple copies of a noisy magic state, process them together using quantum operations, and produce a smaller number of less-noisy magic states.

Here's a simplified outline of the process using the $|T\rangle$ state, a commonly used magic state:

1. **Preparation**: Prepare multiple copies of the noisy $|T\rangle$ state:
$|T\rangle = \cos(\frac{\pi}{8})|0\rangle + e^{i \frac{\pi}{4}}\sin(\frac{\pi}{8})|1\rangle$ - (Stimmt das?)

2. **Encoding**: Encode these states using a specially designed quantum error-correcting code. For our purposes, think of this as a way to "spread out" the errors.

3. **Verification**: Perform a quantum error-detection procedure on the encoded states. If the procedure detects an error, the states are discarded, and the process starts over.

4. **Decoding**: If no error is detected, decode the protected information to produce a fewer number of $|T\rangle$ states that have a higher fidelity than the initial states.

The process can be iteratively repeated, using the distilled states as the input for another round of distillation until the desired fidelity is achieved.

While this is a conceptual overview, the real magic state distillation involves a deeper dive into the specifics of the error-correcting codes and quantum circuits used for encoding, verification, and decoding. Researchers have proposed various protocols to optimize the trade-offs in terms of the number of initial states needed, the final fidelity achieved, and the overhead of the quantum operations.

*(2004) Fault-Tolerant Postselected Quantum Computation: Schemes (**Magic State Distillation**)*

https://arxiv.org/abs/quant-ph/0402171

https://arxiv.org/abs/quant-ph/0403025

https://en.m.wikipedia.org/wiki/Magic_state_distillation#:~:text=Magic%20state%20distillation%20is%20a,contribute%20to%20quantum%20computers'%20power.

https://arxiv.org/abs/1209.2426

Magic state distillation is a process in quantum computing which aims to produce high-quality, or "magic", quantum states from imperfect ones, thus helping to correct errors that arise due to the imperfections of physical qubits and quantum gates. This method is used to implement non-Clifford gates, which are a crucial building block for universal quantum computation, but are particularly prone to noise and errors.

The term "distillation" is used to evoke the idea of refining or purifying a substance, just like in chemistry. In this context, multiple imperfect copies of a quantum state are combined and processed to produce fewer copies with higher fidelity.

Magic state distillation is an important part of some quantum error correction schemes, such as the surface code. These schemes are key to creating reliable and scalable quantum computers. However, as of my knowledge cutoff in 2021, practical, large-scale implementations of magic state distillation were yet to be realized, as the process is quite resource-intensive.


**Magic States**

* if we have access to certain specific quantum states, known as magic states, then even with only Clifford gates, we can perform universal quantum computation. This is done by using these magic states as resources in a specific type of quantum protocol called magic state distillation.

* These magic states are especially important in quantum error correction because they are resistant to certain types of quantum noise and can be "distilled" to produce high-fidelity magic states even when we start with imperfect copies. This makes them very useful in designing fault-tolerant quantum computers, as they can help us overcome the errors that inevitably occur in any real-world quantum system.

Magic states are purified from n copies of a mixed state $\rho$ . These states are typically provided via an ancilla to the circuit. A magic state for the $T$ gate is ${\displaystyle |M\rangle =\cos(\beta /2)|0\rangle +e^{i{\frac {\pi }{4}}}\sin(\beta /2)|1\rangle }$ where ${\displaystyle \beta =\arccos \left({\frac {1}{\sqrt {3}}}\right)}$. By combining (copies of) magic states with Clifford gates, can be used to make a non-Clifford gate

*The first magic state distillation algorithm, invented by Sergey Bravyi and Alexei Kitaev, is a follows.*

* Input: Prepare 5 imperfect states.
* Output: An almost pure state having a small error probability.
* repeat
  * Apply the decoding operation of the five-qubit error correcting code and measure the syndrome.
  * If the measured syndrome is ${\displaystyle |00000\rangle }$, the distillation attempt is successful.
  * else Get rid of the resulting state and restart the algorithm.
* until The states have been distilled to the desired purity.

[**Magic state distillation**](https://en.m.wikipedia.org/wiki/Magic_state_distillation) is a procedure in quantum computation that is used to create high-fidelity quantum states, starting from several copies of a "noisy" state. The name "distillation" is used because it's similar to the idea of distilling a liquid to increase its purity.

The technique was first proposed by Emanuel Knill in 2004, and further analyzed by Sergey Bravyi and Alexei Kitaev the same year.

Here's a high-level overview of how it works:

1. **Preparation**: Prepare several copies of a quantum state. These states are typically "noisy" copies of a desired magic state, meaning they have some errors or deviations from the perfect magic state.

2. **Error Detection**: Apply a quantum error-detecting code to these copies. This code is a series of quantum operations that can identify (but not correct) certain errors in the quantum states.

3. **Measurement**: Measure the error-detecting code. This does not directly measure the quantum states themselves but measures whether an error has been detected.

4. **Post-selection**: If the measurement indicates that no errors were detected, then (due to the properties of quantum mechanics) the quantum states are projected into a state that is closer to the perfect magic state than the original states were. If an error was detected, discard the states and start over.

5. **Repeat**: Repeat this process several times. Each time, the states get closer to the perfect magic state.

By repeating this process, one can "distill" a set of noisy magic states into a smaller number of higher-fidelity magic states. This is a crucial part of many fault-tolerant quantum computing schemes, as it allows one to perform universal quantum computation even in the presence of noise.

###### *T-Gate and Toffoli Gate*

Video: [Jens Eisert | September 27, 2022 | A Single T-Gate Makes Quantum Distribution Learning Hard](https://www.youtube.com/watch?v=HAKPulM08hs)

**What is so special about the T-gate?**

* non-Clifford gate, majority of non-Clifford gates used, most expensive gate (based on surface code), for universal quantum gate, controls phase / rotation (phase shift) by $\frac{i \pi}{4}$ around Z-axis
* [Craig Gidney of why T-gates are best option](https://quantumcomputing.stackexchange.com/questions/33357/universal-gate-set-magic-states-and-costliness-of-the-t-gate/33358#33358)
* [Can someone explain to me what is T gate and tdg like im five?](https://quantumcomputing.stackexchange.com/questions/26666/can-someone-explain-to-me-what-is-t-gate-and-tdg-like-im-five)
* [Why do we care about the number of 𝑇 gates in a quantum circuit?](https://quantumcomputing.stackexchange.com/questions/21580/why-do-we-care-about-the-number-of-t-gates-in-a-quantum-circuit)

> $\mathbf{T}=\sqrt[4]{\mathbf{Z}}=\left[\begin{array}{cc}1 & 0 \\ 0 & e^{\frac{i \pi}{4}}\end{array}\right]$

> Applying it: $
T|1\rangle=\left[\begin{array}{ll}
1 & 0 \\
0 & e^{\frac{i \pi}{4}}
\end{array}\right]\left[\begin{array}{l}
0 \\
1
\end{array}\right]=e^{\frac{i \pi}{4}}|1\rangle
$ Since QPE will give us $\theta$ where: $
T|1\rangle=e^{2 i \pi \theta}|1\rangle
$ we expect to find theta: $
\theta=\frac{1}{8}
$

*Paper [Universal Quantum Computation with ideal Clifford gates and noisy  ancillas](https://arxiv.org/abs/quant-ph/0403025): check out figure 1 that provides a nice visualization of two types of magic states and also definition 1 on page 3. confusingly, T state is the "H-type state" and not the "T-type state" in their lingo (since the density matrix of the T state is (I+1/sqrt2(X+Y))/2) - Yes, it seems to me that in the old days folks named these states by the operators of which the states were the eigenvectors (rather than by the gate you could realize using the states as we do today). the caption of figure 1 undersells its significance. the octahedron consists of states which don't boost computational power of the Clifford gateset (see Theorem 1 on the same page). it's the high-purity states outside the octahedron that are useful. I mean isn't the eigenstate the more sensible naming scheme?  Old man shakes cane. T |T> = t |T>. +1 to naming things for what they are instead of for what you can do with them (as that is subject to change)*

**What is so special about the Toffoli-gate?**

![ggg](https://upload.wikimedia.org/wikipedia/commons/thumb/e/ef/Qcircuit_ToffolifromCNOT.svg/799px-Qcircuit_ToffolifromCNOT.svg.png)

* construct a quantum mechanical Toffoli gate out of 1-qubit and 2-qubit gates.
* [Toffoli gate](https://en.m.wikipedia.org/wiki/Toffoli_gate#), a reversible gate, and is universal in classical computing - can simulate classical computation (https://youtu.be/qs0D9sdbKPU?t=6520) - essential tool for building big classical operations like multipliers and [multiplexers](https://en.m.wikipedia.org/wiki/Multiplexer)
* It is used in the Shor algorithm, quantum error correction, fault-tolerant computation and quantum arithmetic operations (https://www.nature.com/articles/npjqi201619)
* In many of Google's papers talk about directly distilling Toffoli rather T = Toffoli has many clear connections to classical logic and is a more natural gate when your algorithm is bottlenecked by classical logic rather than single qubit rotations
* [Are Toffoli gates actually used in designing quantum circuits?](https://quantumcomputing.stackexchange.com/questions/11915/are-toffoli-gates-actually-used-in-designing-quantum-circuits)
* [A Simple Proof that Toffoli and Hadamard are Quantum Universal](https://arxiv.org/abs/quant-ph/0301040)

###### *Hadamard Gate*

Video: [Logic Gates Rotate Qubits](https://www.youtube.com/watch?v=ZBaXPY_0TNI)

Video: [Qubits vs Bits: The Kickback Effect](https://www.youtube.com/watch?v=EjdngeGXWEg)

*Hadamard Gate (Superposition)*

> $H=\frac{1}{\sqrt{2}}\left(\begin{array}{cc}1 & 1 \\ 1 & -1\end{array}\right)$

**The Hadamard states ∣+⟩ and ∣−⟩ are considered superposition states**

because they are a combination of the two computational states:

> $|\pm\rangle=\frac{1}{\sqrt{2}}|0\rangle \pm \frac{1}{\sqrt{2}}|1\rangle$

**Apply Hadamard gate on a qubit that is in the |0> state**:

> <font color="blue">$\frac{1}{\sqrt{2}}\left(\begin{array}{cc}1 & 1 \\ 1 & -1\end{array}\right)\left[\begin{array}{l}1 \\ 0\end{array}\right]=\frac{1}{\sqrt{2}}\left[\begin{array}{l}1 \\ 1\end{array}\right]$

The qubit enters a new state where the probability of measuring 0 is:

* $\left(\frac{1}{\sqrt{2}}\right)^{2}=\frac{1}{2}$

And the probability of measuring 1 is also:

* $\left(\frac{1}{\sqrt{2}}\right)^{2}=\frac{1}{2}$

**Now apply Hadamard gate on a qubit that is in the |1> state**:

> <font color="blue">$\frac{1}{\sqrt{2}}\left(\begin{array}{cc}1 & 1 \\ 1 & -1\end{array}\right)\left[\begin{array}{l}0 \\ 1\end{array}\right]=\frac{1}{\sqrt{2}}\left[\begin{array}{l}1 \\ -1\end{array}\right]$

The qubit enters a new state where the probability of measuring 0 is:

* $\left(\frac{1}{\sqrt{2}}\right)^{2}=\frac{1}{2}$

And the probability of measuring 1 is also:

* $\left(\frac{-1}{\sqrt{2}}\right)^{2}=\frac{1}{2}$

Hence, in both cases (qubit |0> or qubit |1>) applying a Hadamard Gate gives an equal chance for the qubit to be 0 or 1' when measured.

Source: https://freecontent.manning.com/all-about-hadamard-gates/

**Herleitung Hadamard (Wichtig!)**

For an equal (or uniform) superposition of the two computational states, we can set the two coefficients equal to each other:

$a_{1}=a_{2}=a$

The normalization condition for a well-behaved quantum state requires that the sum of the squared magnitudes of the coefficients be equal to one; this is sufficient to find
a
a for a uniform superposition:

$|a|^{2}+|a|^{2}=1$

$2|a|^{2}=1$

$|a|^{2}=\frac{1}{2}$

$a=\frac{1}{\sqrt{2}}$

In vector form, this state can represented as

> $\left[\begin{array}{l}\frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}}\end{array}\right]$ = $\frac{1}{\sqrt{2}}$ $\left[\begin{array}{c} 1 \\ 0 \end{array}\right]$ + $\frac{1}{\sqrt{2}}$ $\left[\begin{array}{c} 0 \\ 1 \end{array}\right]$

This is what we can use now to understand Hadamard, where you want "halfway" a 50/50 chance of basis states 0 and 1 (Bloch sphere representation of superposition state):

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_025.jpg)

*Hadamard gate operations:*

> $\begin{aligned} H(|0\rangle) &=\frac{1}{\sqrt{2}}|0\rangle+\frac{1}{\sqrt{2}}|1\rangle=:|+\rangle \\ H(|1\rangle) &=\frac{1}{\sqrt{2}}|0\rangle-\frac{1}{\sqrt{2}}|1\rangle=:|-\rangle \\ H\left(\frac{1}{\sqrt{2}}|0\rangle+\frac{1}{\sqrt{2}}|1\rangle\right) &=\frac{1}{2}(|0\rangle+|1\rangle)+\frac{1}{2}(|0\rangle-|1\rangle)=|0\rangle \\ H\left(\frac{1}{\sqrt{2}}|0\rangle-\frac{1}{\sqrt{2}}|1\rangle\right) &=\frac{1}{2}(|0\rangle+|1\rangle)-\frac{1}{2}(|0\rangle-|1\rangle)=|1\rangle \end{aligned}$

One application of the Hadamard gate to either a 0 or 1 qubit will produce a quantum state that, if observed, **will be a 0 or 1 with equal probability** (as seen in the first two operations). This is exactly like flipping a fair coin in the standard probabilistic model of computation. However, if the Hadamard gate is applied twice in succession (as is effectively being done in the last two operations), then the final state is always the same as the initial state (because quantum operations are reversable, unlike operations on classical computers).

###### *Quantum Error Correction*

**Error Correction Zoo**

https://errorcorrectionzoo.org/c/honeycomb

https://errorcorrectionzoo.org/list/ag

https://errorcorrectionzoo.org/list/homological

https://arxiv.org/abs/quant-ph/0110143

https://arxiv.org/abs/1603.02286

Sometimes transversal means specifically "the logical operation has constant depth, regardless of code distance". For example, the transversal S gate in the folded surface code uses physical gates that aren't the S gate. It uses two qubit gates across the folded halves.

https://quantumcomputing.stackexchange.com/questions/24269/what-is-formally-a-transversal-operator

**How does quantum error correction work?**

Quantum error correction is a set of techniques for protecting quantum information from errors due to decoherence and other quantum noise. Here's a high-level summary:

1. **Encoding**: The first step in quantum error correction is to encode the quantum information. Rather than storing a quantum bit of information (a "qubit") in a single physical qubit, we store it in multiple physical qubits. This is done in such a way that, even if some of the physical qubits are corrupted by noise, the original quantum information can still be recovered. This encoding is done using a quantum error-correcting code. There are many different types of quantum error-correcting codes, each with its strengths and weaknesses.

2. **Syndrome measurement**: Once the quantum information has been encoded, the next step is to periodically check for errors. This is done by performing a syndrome measurement, which is a special kind of quantum measurement that can detect whether an error has occurred, and if so, what kind of error it was. Importantly, this measurement does not disturb the encoded quantum information.

3. **Error correction**: If the syndrome measurement indicates that an error has occurred, the next step is to perform an error correction. This involves applying a series of quantum gates to the physical qubits to correct the error, based on the result of the syndrome measurement.

4. **Repeat**: Because quantum systems are always subject to noise, this process of syndrome measurement and error correction needs to be repeated periodically to keep the errors in check.

Challenges:

* It's worth noting that the whole process of quantum error correction requires a significant overhead in terms of additional physical qubits and quantum operations. This is a major challenge in the development of large-scale, fault-tolerant quantum computers.

* It's also worth noting that, although quantum error correction can protect against many types of errors, it cannot protect against all possible errors. In particular, **it's assumed that the errors are relatively rare and do not all occur at once, and that they are independent and identically distributed across the physical qubits**. **If these assumptions are violated, then quantum error correction may not be able to correct the errors**. This is another major challenge in the field of quantum error correction.

Video: [Introduction to Toric Code](https://youtu.be/ZRqgAbBGg40) very good
Video: [The superconducting transmon qubit](https://youtu.be/dKTNBN99xLw)
Video: [The transmon qubit](https://youtu.be/cb_f9KpYipk)
Video: [Making quantum error correction practical](https://youtu.be/YPFpll1NFQc)
Video: [Steven Girvin](https://youtu.be/nhUKHf-GN_Y)
Video: [Quantum Industry Talks](https://youtu.be/eyICn3KCUPI)
Video: [Qiskit QEC](https://youtu.be/ZY8PddknCos), [Qiskit](https://youtu.be/SHr3uSv9Bts), [qiskit](https://youtu.be/96a0G4G5ZH8)

***Concepts:*** *Ancilla qubits, surface code, distance code, X (bit flip) and Z (phase flip) error, threshold theory, required fault-tolerance, logical vs physical qubit*

*Tasks: Error detection, error mitigation, error correction, error suppression*

Video [Progress Towards Quantum Error Correction with the Surface Code | Qiskit Seminar Series](https://www.youtube.com/watch?v=si5a9RJP01A)

Use more than one qubit to represent a state, use neighboring qubit check (so you don't measure the exact state which would collapse the quantum state). You can decode with it and see that the error was on the last qubit. Then you can correct the physical qubit to get back the correct logical qubit state:

![sciences](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_1278.png)

[An Introduction to Quantum Error Correction and Fault-Tolerant Quantum Computation](https://arxiv.org/pdf/0904.2557.pdf)

For what do you need abelian groups in quantum computing?

In quantum computing, Abelian groups are used to describe the symmetry of a quantum system. Symmetry is a fundamental concept in quantum mechanics, and it refers to the idea that a physical system will remain unchanged under certain transformations. For example, the symmetry of a quantum system may be described by a group of rotations, translations, or reflections. Abelian groups are used to describe the symmetry of a quantum system because they have the useful property of being commutative, meaning that the order in which the transformations are applied does not affect the outcome. This property makes it possible to use Abelian groups to describe the symmetries of a quantum system in a way that is mathematically tractable and easy to work with.

**Classical Error detection and correction**

* https://en.m.wikipedia.org/wiki/Error_detection_and_correction

* https://en.m.wikipedia.org/wiki/Repetition_code

**Faut-Tolerant QC (Quantum Error Correction)**

* Bit flip (from 0 to 1) or dephasing (from superposition to exact state)

* Challenge: we need to keep that states correct without looking at them (because then WE dephase them)

* We need a method to **build relatively noiseless qubits (logical qubits)out of many noisy ones (physical qubits)**. This is quantum error correction.

* Solution: one way (repetition encoding), sit our qubits on a line. We then go along and ask every pair of next-door-neighbours whether they agree or disagree with each other. This tells us nothing about whether they are 0 or 1. But repetition encoding, which protects against bit flip errors so well, actually makes dephasing more likely!

* https://en.m.wikipedia.org/wiki/Quantum_error_correction


* Examples of QEC: **repetition code** (simplest QEC) and **surface code** (and color codes?)

* techniques: syndrome measurements, decoding, logical operations

* https://www.quantamagazine.org/how-space-and-time-could-be-a-quantum-error-correcting-code-20190103/


* [An introduction to Fault-tolerant Quantum Computing](https://arxiv.org/abs/1508.03695)

http://decodoku.blogspot.com/2016/02/5-story-so-far_57.html

* [INTRODUCTION TO
QUANTUM ERROR
CORRECTION](https://cpb-us-w2.wpmucdn.com/voices.uchicago.edu/dist/0/2327/files/2019/11/QECIntro.pdf)

**Exkurs: Error Mitigation**

* quantum error correction is long term goal, meanwhile we try to mitigate it

* Error mitigation techniques: statistical corrections (on histogram for example)

	* https://qiskit.org/textbook/ch-quantum-hardware/measurement-error-mitigation.html

	* https://arxiv.org/abs/2005.10189

**DiVincenzo's criteria**

Video: [Introduction to Toric Code](https://youtu.be/ZRqgAbBGg40) (very good!)

Video: [Gottesman 1](https://youtu.be/ltJ1jXQeDl8) and [Gottesman 2](https://youtu.be/cUqys29d0YA)

[DiVincenzo's criteria](https://en.m.wikipedia.org/wiki/DiVincenzo%27s_criteria) are conditions necessary for constructing a quantum computer, conditions proposed in 2000 by the theoretical physicist David P. DiVincenzo.

**1. A scalable physical system with well-characterized qubits.**

**2. The ability to initialize the state of the qubits to a simple fiducial state, such as 000...).**

**3. Long relevant decoherence times, much longer than the gate operation time.**

**4. A "universal" set of quantum gates** (that approximate any unitary operation - a unitary transformation preserves the inner product, which is a property of the Hilbert space)

**5. A qubit-specific measurement capability** (ability to measure individual qubits)

* Trapped Ion and superconducting qubits do really well on all five criteria

6. *The ability to interconvert stationary and flying qubits.*

7. *The ability to faithfully transmit flying qubits between specified locations.*

* The DiVincenzo criteria consist of seven conditions an experimental setup must satisfy to successfully implement quantum algorithms such as Grover's search algorithm or Shor factorization.

* The first five conditions regard quantum computation itself. Two additional conditions regard implementing quantum communication, such as that used in quantum key distribution. One can demonstrate that DiVincenzo's criteria are satisfied by a classical computer.

* Comparing the ability of classical and quantum regimes to satisfy the criteria highlights both the complications that arise in dealing with quantum systems and the source of the quantum speed up.

*Universal quantum computing and quantum annealer: not all criteria match the quantum annealers*

**Definition of Quantum Computing**

[Quantum computing](https://en.m.wikipedia.org/wiki/Quantum_computing) is a type of computation whose operations can harness the phenomena of quantum mechanics, such as superposition, interference, and entanglement to perform computation. Devices that perform quantum computations are known as quantum computers.

*Harnessing effects of quantum mechanics: by this definition also quantum annealers are quyantum computers.*

###### *Stabilizer Code and Parity Check Measurement*

**Exkurs: Stabilizer Code**

*Exkurs: Stabilizer Code & Ancilla qubits*

* A [stabilizer](https://en.m.wikipedia.org/wiki/Stabilizer_code) quantum error-correcting code appends [ancilla qubits](https://en.m.wikipedia.org/wiki/Ancilla_bit) to qubits that we want to protect.

**A stabilizer quantum error-correcting code appends ancilla qubits to qubits that we want to protect**. A unitary encoding circuit rotates the global state into a subspace of a larger Hilbert space. This highly entangled, encoded state corrects for local noisy errors.

https://en.m.wikipedia.org/wiki/Stabilizer_code

In quantum computing, **a stabilizer code is a type of error-correcting cod**e that is used to protect quantum information from the effects of noise and decoherence. These codes are based on the concept of stabilizer operators, which are a special type of operator that can be used to detect and correct errors in a quantum system. The basic idea behind stabilizer codes is to encode the quantum information using a set of stabilizer operators, such that any errors that occur in the system can be detected and corrected by measuring the values of these operators. GPT


Many quantum error correction schemes can be classified as stabilizer codes, where a single bit of **quantum information is encoded in the joint state of many physical qubits**, which we refer to as data qubits. Interspersed among the data qubits are **measure qubits**, which periodically measure the parity of chosen combinations of data qubits. https://arxiv.org/pdf/2102.06132.pdf


https://www.youtube.com/watch?v=Rs2NMe4Lsbw&t=456s

https://leftasexercise.com/2019/01/28/basics-of-quantum-error-correction/

https://leftasexercise.com/2019/02/04/q-fault-tolerant-quantum-computing/

https://leftasexercise.com/2019/03/25/qec-an-introduction-to-toric-codes/

https://leftasexercise.com/2019/04/08/quantum-error-correction-the-surface-code/

https://leftasexercise.com/2019/02/11/quantum-error-correction-with-stabilizer-codes/

https://leftasexercise.com/2018/09/10/quantum-computing-an-overview/

Stabilizer code with parity check circuits (for Z and X errors)

**Parity measurement**: is it even (correct) or odd (error)? You will have a square with one dimension for X error and another dimension for Z error measurement. You have an **ancilla qubit** to make the measurements with C-Z-gate and C-X gate (but you first put it into an equal superposition).

![sciences](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_1283.png)

You put both X and Z together and get a 2D lattice of surface code with a determined code distance. We have now data qubits, X ancilla qubits and Z ancilla qubits.

![sciences](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_1284.png)

**Exkurs: Repetition Codes**

* look at majority of bits

* https://qiskit.org/textbook/ch-quantum-hardware/error-correction-repetition-code.html

* repetition code: redundancy (repetition) is a way to make sure the message gets delivered (i.e. with majority voting, for d repetition: $P=\sum_{n=0}^{[ a / 2]}\left(\begin{array}{l}d \\ n\end{array}\right) p^{n}(1-p)^{d-n} \sim\left(\frac{p}{(1-p)}\right)^{[ d / 2]}$

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_100.png)

###### *ZX-calculus*

https://arxiv.org/abs/1307.7025

The ZX-calculus is complete for stabilizer quantum mechanics

ZX Calculus: Hopf rule

Reasoning with connectivity using diagrammatic reasoning

ZX Calculus: Hadamard rule

Evaluating quantum circuits using diagrammatic reasoning.

https://en.wikipedia.org/wiki/ZX-calculus

https://en.wikipedia.org/wiki/Penrose_graphical_notation

https://en.wikipedia.org/wiki/Tensor_network_theory?wprov=sfti1

https://en.wikipedia.org/wiki/Unified_field_theory



https://en.wikipedia.org/wiki/Matrix_product_state?wprov=sfti1


https://en.wikipedia.org/wiki/Density_matrix_renormalization_group?wprov=sfti1

https://en.wikipedia.org/wiki/Categorical_quantum_mechanics?wprov=sfti1

And finally I'll learn about fault-tolerance cause now it's in a language that I can understand, cause I co-invented it with Ross Duncan!

https://lnkd.in/dX2YDzfF

But the “newly introduced ZX-instruments” want to be the bastard spiders of the dodo-book. They go back a long time, even before ZX calculus itself, and published here:

https://lnkd.in/e4VvrGHA

Quantum In Pictures also has that stuff, and so does this paper:

https://lnkd.in/eSwFWkHZ

###### *Fidelity (Error Probability)*

Physical and Logical Qubits

https://en.m.wikipedia.org/wiki/Physical_and_logical_qubits

**Fidelity**: For Shor's algorithm with estimated 10^9 physical gates required, the error should be less than 10^-9, ideally 10^-10. We are still several orders of magnitude away from that accuracy / faut-tolerance.

![sciences](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_1279.png)

Many applications call for error rates in the 10−15 regime [2–9], but state-of-the-art quantum platforms typically have physical error rates near 10−3
https://arxiv.org/pdf/2102.06132.pdf

To run quantum algorithms perfectly we need error probability of 1 in a billion or 1 in a trillion - but we are at 1 in a thousand

Video: [Suppressing quantum errors by scaling a surface code logical qubit](https://www.youtube.com/watch?v=dVkLNwSTBU0)

**Exkurs: Fidelity of quantum states**

* https://en.m.wikipedia.org/wiki/Fidelity_of_quantum_states

* Noise transforms pure states into mixed states.

  * There are also simpler ones: Fidelity between two pure states

  * And there are also more complex ones: Fidelity between two mixed states

* fidelity is generally defined as the quantity:

> ${\displaystyle F(\rho ,\sigma )=\left(\operatorname {tr} {\sqrt {{\sqrt {\rho }}\sigma {\sqrt {\rho }}}}\right)^{2}}$

* most useless state is fidelity 0,5. because fidelity = 0 means orthogonal, and =1 means exactly the same.

* Video: [Fidelity](https://www.youtube.com/watch?v=GWi_HIVz2B4)

![sciences](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_1275.png)

![sciences](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_1276.png)

###### *Error Syndrome (Measurement)*

There are two types of errors that you want to detect: **Bit-flip** (represented with Pauli X gate) and **Phase-flip** (represented with Pauli-Z gate). But 1D string physical qubits cannot protect from bit and phase flip at the same time. You need a 2D string.

![sciences](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_1281.png)

**Nota bene: Qubits können auch ganz verloren gehen**

* Inzwischen können Quantencomputer mit einer gewissen Anzahl von Rechenfehlern, wie zum Beispiel Bitflip- oder Phasenflip-Fehlern, umgehen. Zusätzlich zu diesen Fehlern können jedoch auch Qubits ganz aus dem Quantenregister verloren gehen.

* Je nach Art des Quantencomputers kann dies auf den tatsächlichen Verlust von Teilchen wie Atomen oder Ionen zurückzuführen sein, oder darauf, dass Quantenteilchen beispielsweise in unerwünschte Energiezustände übergehen, welche nicht mehr als Qubit erkannt werden. Wenn ein Qubit verloren geht, wird die Information in den verbleibenden Qubits unlesbar und ungeschützt. Für das Ergebnis der Berechnung kann dieser Prozess zu einem potentiell verheerenden Fehler werden.

https://www.cosmos-indirekt.de/News/Neue_Methode_schützt_Quantencomputer_vor_Ausfällen.html

Resolving catastrophic error bursts from cosmic rays in large arrays of superconducting qubits.

https://arxiv.org/abs/2104.05219

https://physicsworld.com/a/cosmic-ray-threat-to-quantum-computing-greater-than-previously-thought/

**Error Syndrome Measurement**

**|1> measured in instead of a |0> : The permutations of outputted |1〉’s is called the “error syndrome”.**

Syndrome: what qubit the error is on

Error syndrome measurement:

https://www.quora.com/Error-Correcting-Codes-What-is-a-syndrome:

* The syndrome measurement provides information about the error that has happened, but not about the information that is stored in the logical qubit—as otherwise the measurement would destroy any quantum superposition of this logical qubit with other qubits in the quantum computer, which would prevent it from being used to convey quantum information. (https://en.m.wikipedia.org/wiki/Quantum_error_correction)

* It is the result of multiplying a parity check matrix times a vector. By convention, codewords of a code have syndrome zero, so that by linearity of the code, the syndrome of a word is the syndrome of the "error" vector. Typically from the syndrome you would either try to determine whether there was an error (is the syndrome nonzero?) and recover the error from it, so that in turn you can recover the data from the received word.

syndrome. = error?

here slide 4: https://people.engr.tamu.edu/andreas-klappenecker/689/stabilizer.pdf

###### *Code Distance and Surface Code*

https://en.m.wikipedia.org/wiki/Five-qubit_error_correcting_code

Quantum **Threshold theory** ensure that there is a limit that helps to get error under control even with larger numbers of physical qubits. The **code distance** is then a result of the max error rate and represents the number of physical qubits for one state:

![sciences](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_1280.png)

**Exkurs: Quantum Threshold Theorem**

* Challenge: quantum computer will not be able to perform gate operations perfectly, some small constant error is inevitable

* [quantum threshold theorem](https://en.m.wikipedia.org/wiki/Quantum_threshold_theorem) (or quantum fault-tolerance theorem) states that a quantum computer
  * **with a physical error rate below a certain threshold** can,
  * **through application of quantum error correction schemes**,
  * suppress the logical error rate to arbitrarily low levels.

* This shows that quantum computers can be made fault-tolerant, as an analogue to von Neumann's threshold theorem for classical computation

* The formal statement of the threshold theorem depends on the types of error correction codes and error model being considered.

* for any particular error model (such as having each gate fail with independent probability p), use **error correcting codes** to build better gates out of existing gates.

  * Though these "better gates" are larger, and so are more prone to errors within them, their error-correction properties mean that they have a lower chance of failing than the original gate (provided p is a small-enough constant).

  * Then, one can use these better gates to recursively create even better gates, until one has gates with the desired failure probability, which can be used for the desired quantum circuit.

* Current estimates put the threshold for the [surface code](https://en.m.wikipedia.org/wiki/Toric_code) (here: Toric code) on the order of 1%, though estimates range widely and are difficult to calculate due to the exponential difficulty of simulating large quantum systems.

* At a 0.1% probability of a [depolarizing](https://en.m.wikipedia.org/wiki/Depolarization) error, the surface code would require approximately 1,000-10,000 physical qubits per logical data qubit, though more pathological error types could change this figure drastically.




**Exkurs: Distance Code / Code Distance**

* https://physics.stackexchange.com/questions/29397/what-is-the-code-distance-in-quantum-information-theory


* Over all 25 cycles of error correction, the distance-5 code realises lower logi- cal error probabilities pL than the average of the subset distance-3 codes - [Paper](https://arxiv.org/pdf/2207.06431.pdf)

* the distance is the shortest path in a certain "space of errors" which maps between two orthogonal quantum states that are in the code.

* The natural space of errors is that of single qubit errors of the form 𝜎𝑋, 𝜎𝑌 or 𝜎𝑧, in the case where the Hilbert space is that of 𝑛 qubits.

* So you can think of distance as the shortest path to get from one state to another by operations on single qubits, applied one at a time sequentially.

* [Source](https://physics.stackexchange.com/questions/29397/what-is-the-code-distance-in-quantum-information-theory)

*Surface Code*

No, code distance and surface code are not the same in quantum error correction.

* **Code distance** is a measure of the ability of a quantum error-correcting code to correct errors. A code with a higher code distance can correct more errors than a code with a lower code distance.
* **Surface code** is a specific type of quantum error-correcting code that has a high code distance. Surface codes are often used in quantum computers because they are relatively easy to implement and can correct a large number of errors.

In other words, code distance is a property of all quantum error-correcting codes, while surface code is a specific type of quantum error-correcting code that has a high code distance.

Here is an analogy to help you understand the difference between code distance and surface code:

* Code distance is like the number of lanes on a highway. A highway with more lanes can handle more traffic than a highway with fewer lanes.
* Surface code is like a specific type of highway that is designed to handle a lot of traffic. Surface highways have more lanes than other types of highways, so they can handle more traffic.

I hope this helps! Let me know if you have any other questions.

**2D Surface code** protects from X error and Z error (X and Z - that's why 2 D). Errors typically arise only locally. The gate structure needs to fit the physical geometry of the quantum processor. Error per gate should be 0,5%, but overall threshold depends on case.

![sciences](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_1282.png)

**Exkurs: Stabilizer Code $\rightarrow$ Surface Code $\rightarrow$ Toric Code**

* https://quantumcomputing.stackexchange.com/questions/2106/what-is-the-surface-code-in-the-context-of-quantum-error-correction

* **Surface codes**: family of quantum error correcting codes defined on a 2D lattice of qubits.

* Each code has [stabilizers](https://en.m.wikipedia.org/wiki/Stabilizer_code) that are defined equivalently in the bulk, but differ from one another in their boundary conditions.

* The members of the surface code family are sometimes also described by more specific names:

  * The [toric code](https://en.m.wikipedia.org/wiki/Toric_code) is a surface code with periodic boundary conditions,

  * the planar code is one defined on a plane, etc.

* How many qubits are in a surface code? - While the surface code requires four-qubit measurements to encode a single logical qubit, we introduce families of quantum error correcting codes that use only three-qubit measurements. [Paper](https://www.ucl.ac.uk/quantum/news/2021/aug/subsystem-codes-outperform-surface-code)

* [Surface codes: Towards practical large-scale quantum computation](https://arxiv.org/abs/1208.0928)

* [Topological quantum memory (paper)](https://arxiv.org/abs/quant-ph/0110143)

* [Surface codes: Towards practical large-scale quantum computation (paper)](https://arxiv.org/abs/1208.0928)

* [My blog series introducing surface codes](http://decodoku.blogspot.com/2016/02/5-story-so-far_57.html)

* The surface codes can also be generalized to qudits. For more on that, [see here (Fault-tolerant quantum computation by anyons)](https://arxiv.org/abs/quant-ph/9707021)

**Exkurs: Toric Code**

* For the toric code we don’t put our qubits in a line, we put them in a grid pattern.

* Video: [INTRODUCTION TO TOPOLOGICAL ORDER, DEMONSTRATION VIA THE TORIC CODE](https://www.youtube.com/watch?v=Rs2NMe4Lsbw&t=456s)

* https://leftasexercise.com/2019/03/25/qec-an-introduction-to-toric-codes/

* http://decodoku.blogspot.com/2016/03/6-toric-code.html

* http://decodoku.blogspot.com/2016/03/6-toric-code-part-2.html

* http://decodoku.blogspot.com/2016/03/8-toric-code-part-3.html

* http://decodoku.blogspot.com/2016/04/9-toric-code-part-4.html

* http://decodoku.blogspot.com/2016/04/10-toric-code-part-5.html

**Exkurs: Color Code**

* [Fault-tolerant quantum computing with color codes](https://arxiv.org/abs/1108.5738)

* https://physics.stackexchange.com/questions/169176/quantum-error-correction-surface-code-vs-color-code

The color code and surface code are very similar. They are stabilizer codes composed of qubits arranged in two dimensions, requiring only geometrically local stabilizer measurements.

From the theory point of view, the codes are very similar. In fact, with collaborators we have proven that the color code is equivalent to a surface code (paper) up to a geometrically local unitary (one which only makes nearby qubits interact). One can think by analogy of the surface code* as a napkin with two rough and two smooth sides and the color code as folding this napkin along its diagonal. Because in the folded napkin, there are new things that are now close, it is possible to do more logical gates "transversally". This is good because it keeps errors from propagating and is relatively easy. However, the color code needs more qubits to interact in each stabilizer so ends up leading to a lower noise threshold. So one can say that although very similar, each code has its advantages and disadvantages.

At this point, only very small versions of either of these codes are being demonstrated. The Rainer Blatt group demonstrated the smallest possible color-code which also uses 7 qubits (this instance is also referred to as the Steane code). However, the underlying geometry in which the qubits are laid out in the Blatt setup is a linear chain of ions, so I would say that this is not the natural setting to extend to larger and larger system sizes.

**The superconducting qubit people (Martinis, IBM, DiCarlo, ...) on the other hand, are concentrating more on surface codes**. While in principle, their architecture should allow them to go full fledge 2D, for now, they are having the classical logic come in from the sides, which is something that needs to change.

*There is actually an ambiguity as to what to call surface codes, but I will refer to the quantum double of Z2 with rough and smooth boundaries defined by Bravyi and Kitaev (paper).

*Distance 2 Qubit Surface Code (Gate Circuit)*

Let's go down from a seven qubit surface code to a simpler two qubit surface code. Smallest meaningful is a 2x2 lattice. But it's just an error detection code, because it's too small to do error correction.

![sciences](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_1286.png)

**Green line is a flux line**. We put a magnetic flux through the script loop of each qubit to put a a specific frequency where we want it to be. **Pink line is a charge line** that is used for single qubit gates. All the rest (red, blue, purple box) are part of the readout. It's very important to have a good readout - we need to measure the ancilla qubits during the operation to see if there wasn't an error. You see that sort of resonator over each Qubit (die dinger die aussehen wir alte Heizungskoerper) - this is very standard, there is a harmonic oscillator.

![sciences](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_1287.png)

![sciences](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_1288.png)

![sciences](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_1289.png)

*7 Qubit Surface Code (Gate Circuit)*

Build a 7 qubit gate circuit. You can't correct the error, but you can detect it. At the end we verify by measuring the actual state of each qubit.

![sciences](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_1290.png)


We can run multiple measurements in (20) microseconds for Z and X operator:

![sciences](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_1291.png)

**D = 3 Surface Code Stabilizer Gate Sequence**

* Red: data qubits
* Blue: X-type ancilla qubit
* Green Z-type ancilla qubit

![sciences](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_1300.png)

![sciences](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_1301.png)

![sciences](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_1302.png)

![sciences](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_1303.png)

![sciences](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_1304.png)


![sciences](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_1305.png)

![sciences](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_1306.png)


![sciences](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_1307.png)


![sciences](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_1308.png)



*Future of Quantum Error Correction*

**Future of quantum error correction**
* we still have problems like leakage etc
  * When they ask about thresholds, they are also derived from ideal models. Things like circular ZZ coupling can make it much harder.
  * If you add leakage to your CZ gates, you could take a surface code that would sort of there was zero leakage below the physical error rate. But if you add leakage to that 0.1 percent degrees (the red line), which is a second. When you do the decoding (the green line), suddenly you are not below the threshold anymore
* Also, if we want distance n=17 (mentioned in the beginning) you have an insane amount of data coming out of the device:
  * we need ($n^2 -1$) physical qubits for error correction. So if one readout is 1 bit, we need 288 bits per microsecond ($\mu$s) per qubit).
  * This amounts to 288 Gbits per s for 1000 logical qubits

![sciences](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_1292.png)

###### *Anyons, Transmon and Fluxonium*

**Anyon (Quasi Particles)**

* https://de.m.wikipedia.org/wiki/Anyon

* PBS Video on Quasiparticles: https://youtu.be/le_ORQZzkmE

**Transmon and Fluxonium Qubit**

* Better hardware to protect against noise orders of magnitude better (to get down to 10^-5 instead of 10^-9
* they can show cherence times above 1 millisecond, they had single qubit errors of 0.9999 (only 4x) with Fluxonium


![sciences](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_1293.png)

![sciences](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_1299.png)

*Source: https://theorie.physik.uni-konstanz.de/burkard/sites/default/files/images/Seminar_3_TrFl.pdf*


Video: [Google Keynote: Superconducting qubits for quantum computation: transmon vs fluxonium](https://www.youtube.com/watch?v=qsizrKrUZDg)

**Anyons, Qubit Braiding and Topological Quantum Computing**

https://dom-kufel.github.io/blog/2023-05-13-toric_code-intro/#loop-excitations-and-error-correction

https://www.quantamagazine.org/physicists-create-elusive-particles-that-remember-their-pasts-20230509/

![geometry](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_1568.png)



https://arthurpesah.me/blog/2023-05-13-surface-code/

https://dom-kufel.github.io/blog/2023-05-13-toric_code-intro/

https://arthurpesah.me/blog/2023-01-31-stabilizer-formalism-1/

https://arthurpesah.me/blog/2022-01-25-intro-qec-1/

https://blog.google/technology/research/an-important-step-towards-improved-quantum-computers/

https://phys.org/news/2023-05-google-quantum-ai-braids-non-abelian.html

https://www.spektrum.de/news/nichtabelsche-anyonen-auf-quantenprozessor-simuliert/2138241



 I am also frustrated by us and the media representing these anyon simulations as a "step towards fault-tolerance". Certainly topological codes are inspired by the physics of anyons and there are direct analogies between how those codes work and are realized and these simulations. I also have no doubt that certain aspects of these simulations have some degree of error robustness. But my understanding is that once you get serious about turning such simulations into a form of fault-tolerance appropriate for our devices and then try to optimize its realization you end up with the surface code. So while thinking about the physics of error-correction from this perspective might be useful and there are myriad mathematical similarities between what is going on in these simulations and topological codes, I think it is misleading/wrong to claim that performing these simulations is taking us any closer to fault-tolerance (whether the surface code or a different form of it). People keep telling me that I am misunderstanding this (and maybe I am!) but I have not been convinced of that yet.

my objection, which might be different from Cody's, is that the title of this blog post suggests that this experiment is getting us closer to realizing universal fault-tolerant quantum computers. In particular, when we say it is a step closer to improved quantum computers, it makes it sound like this approach is somehow a step on our roadmap for realizing fault-tolerant quantum computers, or that it is going to make our ultimate goal easier or something like that. That is reading between the lines a little bit, but it is how it sounds to me. Is that actually true? Or is this in the category of "nice demonstration, but not something we're going to follow up on, and thus not a step towards us improving quantum computers other than in the very broad sense that it helps test hardware capabilities like most physics team experiments do"? My understanding is that nothing about this experiment is likely to change anything about how we are planning to realize fault-tolerance. Maybe that is incorrect, or it is correct and I am just reading too much into the implications of statements like the title. It would be nice to understand this better.

The main sense in which the non-abelian anyons are useful for fault tolerance is that they are isomorphic to twists in the surface code, and you can use twists to store logical qubits. But we already knew about twists independent of this work e.g. from https://arxiv.org/abs/1609.04673 so I don't know what it adds on the fault tolerance side.

I did give a presentation to the authors of our abelian paper on how to do it fault tolerantly (they were technically "using" a surface code, but they were preparing it entirely unitarily which is not fault tolerant; you have to use measurements for everything). They weren't so interested I think, because the measurement version is harder and maybe also because they didn't consider it "really doing it" in the same way. I actually did run some quick shots of my versions of the circuits on the device back when pink was in M2 shape, and it worked, in that I got non-zero signal (the error rate was close to max but not max). I wasn't doing adept or etc or etc; if an experimentalist did it it would have been better. Also I now have much better versions of the circuit, which are described in https://arxiv.org/abs/2302.07395 .

My point was that if you interpret TQC as broadly ("you have things that act like anyons") then this is demonstrating something interesting (multiparticle entangled EC state, FT gates on that state).  But I also agree with Ryan in that I think the blog post does not convey this well.

Fair enough, I'm just trying to put forth a perspective on why it is fine to say this is an interesting experiment in topological quantum computation.  And if you want some idea of how it might lead to noise protection, yes  https://arxiv.org/abs/quant-ph/9912040 which was expanded upon in https://arxiv.org/abs/0907.3988

And indeed our own team's work investigating how simulations of many body systems are related does or does not give protection of topologically protected information is in this direction.

I think I agree with everything you've said Dave. But I don't think any of that really justifies the framing of the blog post title, which to me suggests that this is moving us closer to building a fault-tolerant quantum computer as opposed to exploring the error robustness of anyon simulation.

Yes, anyons and transmons are related in quantum computing. Anyons are theoretical particles that have unique properties that make them well-suited for quantum computing. Transmons are physical implementations of qubits that can be used to build quantum computers.

One of the key properties of anyons is that they can be braided together to perform quantum operations. This is because the braiding of anyons is a topological operation, meaning that it is not affected by the environment. This makes anyons very resistant to noise, which is a major challenge in quantum computing.

Transmons, on the other hand, are physical particles that can be used to build qubits. Qubits are the basic unit of information in quantum computing. Transmons are made of superconducting circuits and can be manipulated using microwaves.

Transmons are not as topologically protected as anyons, but they are much easier to implement in a practical quantum computer. This is because transmons can be manufactured using existing semiconductor technology.

So, anyons and transmons are both promising candidates for quantum computing. Anyons are more topologically protected, but transmons are easier to implement. It is still not clear which approach will ultimately be more successful, but both anyons and transmons are active areas of research.

Here is a table that summarizes the key differences between anyons and transmons:

| Property | Anyons | Transmons |
|---|---|---|
| Topological protection | Yes | No |
| Ease of implementation | No | Yes |
| Current research status | Active | Active |

I hope this helps! Let me know if you have any other questions.

> Video [Map of Superconductivity](https://youtu.be/bD2M7P6dTVA)


https://en.m.wikipedia.org/wiki/Jellium

https://de.m.wikipedia.org/wiki/Cooper-Paar

https://en.m.wikipedia.org/wiki/Bose–Einstein_condensate

https://opg.optica.org/oe/fulltext.cfm?uri=oe-2-8-299&id=63264

https://de.m.wikipedia.org/wiki/Phonon


Part of this correlation is the formation of pairs of electrons called Cooper pairs. According to Josephson, under certain circumstances these Cooper pairs move from one superconductor to the other across the thin insulating layer. Such motion of pairs of electrons constitutes the Josephson current, and the process by which the pairs cross the insulating layer is called Josephson tunneling.

https://www.britannica.com/science/Josephson-effect


Meissner effect

Meissner effect, the expulsion of a magnetic field from the interior of a material that is in the process of becoming a superconductor, that is, losing its resistance to the flow of electrical currents when cooled below a certain temperature, called the transition temperature, usually close to absolute zero. The Meissner effect, a property of all superconductors, was discovered by the German physicists W. Meissner and R. Ochsenfeld in 1933.

https://slideplayer.com/slide/5010186/

Squid

*Josephson Junction*

**Superconducting integrated circuits**

- Conductance is not constant but varies with how much current is flowing, making it an unharmonic oscillator
- Potential of the conductor is a cosine
- Low energy excitations are pairs of electrons slashing back and forth between the two antenna pads
- From ground state to first excited state: 5 gigahertz
- From first excited state to second excited state transition: 4.9 Ghz (due to flattened curve of cosine)

More details: https://www.youtube.com/watch?v=uD69GCYF9Zg&t=2023s


Übergangsdipolmoment (Transition dipole moment)

https://de.m.wikipedia.org/wiki/Übergangsdipolmoment

https://bsiegelwax.medium.com/i-love-neutral-atoms-47dd41b7a8d5

**Quantum Hardware**

* [superconducting qubits](https://en.m.wikipedia.org/wiki/Superconducting_quantum_computing)

  * [Building a quantum computer with superconducting qubits](https://www.youtube.com/watch?v=uPw9nkJAwDY)

* [trapped ions](https://en.m.wikipedia.org/wiki/Trapped_ion_quantum_computer)

* [liquid and solid state nuclear magnetic resonance](https://en.m.wikipedia.org/wiki/Nuclear_magnetic_resonance_quantum_computer)

* [optical cluster states](https://en.m.wikipedia.org/wiki/One-way_quantum_computer)

![sciences](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_1272.JPG)

**Hardware**


[Simulations Using a Quantum Computer Show the Technology’s Current Limits](https://physics.aps.org/articles/v15/175)

https://www.techexplorist.com/introducing-unimon-new-superconducting-qubit-quantum-computers/54880/?amp

Rydbergs atoms and measuring time: https://www.sciencealert.com/scientists-just-discovered-an-entirely-new-way-of-measuring-time

https://www.scinexx.de/news/physik/ein-moebiusband-aus-licht/

https://phys.org/news/2022-11-erbium-atoms-silicon-prime-candidate.html

https://medium.com/pasqal-io/why-analog-neutral-atoms-quantum-computing-is-the-most-promising-direction-for-early-quantum-77b462cefee0


https://phys.org/news/2022-11-quantum-component-graphene.amp

https://phys.org/news/2022-10-universal-parity-quantum-architecture-limitations.amp

https://www.golem.de/news/quantencomputer-silizium-chip-liest-quantenpunkte-in-rekordzeit-2211-169385.amp.html

https://aws.amazon.com/de/blogs/quantum-computing/an-illustrated-introduction-to-quantum-networks-and-quantum-repeaters/

https://scitechdaily.com/100-times-longer-than-previous-benchmarks-a-quantum-breakthrough/amp/

https://scitechdaily.com/physicists-create-first-quasiparticle-bose-einstein-condensate-the-mysterious-fifth-state-of-matter/amp/

https://www.eetimes.eu/quantum-computers-a-technology-assessment/

https://medium.com/qiskit/using-quantum-computers-to-tackle-complex-chemistry-simulations-with-quantum-embedding-7b7e4306b676

###### *ZZ-Crosstalks*

**ZZ: Residual ZZ coupling, circular ZZ coupling, ZZ crosstalk**

*  Noise is a significant obstacle to quantum computing, and 𝑍 𝑍 cross- talk is one of the most destructive types of noise affecting supercon- ducting qubits. Previous approaches to suppressing 𝑍𝑍 crosstalk have mainly relied on specific chip design that can complicate chip fabrication and aggravate decoherence. To some extent, special chip design can be avoided by relying on pulse optimization to sup- press 𝑍𝑍 crosstalk. However, existing approaches are non-scalable, as their required time and memory grow exponentially with the number of qubits involved. https://arxiv.org/pdf/2202.07628.pdf

* In superconductors a destructive type of noise known as 𝑍𝑍 crosstalk. This refers to an always-on 𝜎𝑧 ⊗ 𝜎𝑧 inter- action between qubits connected by couplings, which originates from the interaction between the computational and non-computational energy levels of qubits.

* different types of crosstalks:

![sciences](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_1295.png)

![sciences](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_1296.png)

![sciences](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_1297.png)

![sciences](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_1298.png)


*Source: https://www.youtube.com/watch?v=si5a9RJP01A&t=3645s*

* second graph: top red is perfect readout, and black light is for max 10% readout
* We need very good readout and very small ZZ error

![sciences](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_1294.png)

###### *Quantum Circuits and Geometry: limitations on the power of quantum computers*

nielsen (differential) geometry complexity

https://arxiv.org/abs/quant-ph/0603161

https://arxiv.org/abs/quant-ph/0701004


###### *Gentle Measurement (and Shadow Tomography)*

https://arxiv.org/abs/1904.08747

**Paper: On quantum backpropagation, information reuse, and cheating measurement collapse**

The proposed algorithm is presented in Figure 1 of the PDF file. It highlights the reduction in quantum resources due to the ability to reuse intermediate information and perform measurements in a way that minimizes disturbance to the quantum state.

However, achieving backpropagation scaling is impossible without access to multiple copies of a state, which is enabled by recent developments in shadow tomography.

* **The origin of the challenge lies within quantum measurement collapse and the inability to read out intermediate states while continuing a computation**, rather than the probabilistic formulation of the problem.

* **In the classical setting, one is always promised to be in a computational basis state**, making it possible to do perfect measurements non-destructively at intermediate steps.

* Although Proposition 3 presents a strict lower bound ruling out backpropagation in the quan- tum data case with single copies, **this leads one to wonder whether backpropagation scaling is possible when one has access to multiple copies**. Moreover, destructive quantum measurements are the inhibitor of backpropagation scaling in single copies, so perhaps there is some middle ground where one could perform measurements that are only partially destructive on multiple copies. This idea has led to breakthroughs in the shadow tomography problem, which we examine next in the context of backpropagation.

* Interesting: "If the difficulty to achieve an efficient scaling is due to inherently quantum properties, **perhaps backpropagation is not the correct method for optimization of quantum models, which seems to be a growing belief for classical models too, albeit for completely different reasons** (=Hinton 2022)"

*How can information reuse be leveraged to improve the efficiency of quantum neural network training?*

* information reuse can be leveraged to improve the efficiency of quantum neural network training. By reusing intermediate information, backpropagation can facilitate training through gradient computation at a total cost roughly proportional to running the function, rather than incurring an additional factor proportional to the number of parameters.

* One expects that quantum measurement collapse entirely rules out the reuse of quantum information as in backpropagation.

* Recent developments in shadow tomography have challenged this notion by assuming access to multiple copies of a quantum state. The paper investigates whether parameterized quantum models can train as efficiently as classical neural networks and shows that achieving backpropagation scaling is impossible without access to multiple copies of a state.

*What is shadow tomography and how is it used in this context?*

* Shadow tomography is a technique that assumes access to multiple copies of a quantum state and uses a hypothesis state model, often stored classically, to require a minimal number of destructive measurements.

* In the context of quantum neural network training, recent developments in shadow tomography have challenged the notion that quantum measurement collapse entirely rules out the reuse of quantum information as in backpropagation.

* By assuming access to multiple copies of a quantum state, a modification of existing shadow tomography routines enables backpropagation scaling if one restricts costs to the quantum overhead and ignores the classical cost incurred to implement known shadow tomography schemes.

*What is shadow tomography and how is it used in the context of quantum computing?*

* Shadow tomography, in the context of quantum computing, is a technique used to characterize an unknown quantum state by gathering statistical information about its measurement outcomes. It is an approach that aims to reconstruct the quantum state without directly accessing it.

* In quantum computing, qubits represent the fundamental units of information and are typically manipulated in superposition states. However, obtaining complete knowledge about the quantum state of a qubit is challenging due to limitations such as noise, decoherence, and the no-cloning theorem. Shadow tomography offers a way to infer information about the state without requiring full access to it.

* The technique involves performing a series of measurements on a large number of identically prepared quantum systems in the same unknown state. Each measurement provides partial information about the state, revealing statistical probabilities for the different measurement outcomes. By gathering a sufficiently large set of measurement statistics, it becomes possible to infer the likely quantum state.

* The reconstruction of the quantum state from the collected statistics is achieved through mathematical techniques, such as maximum likelihood estimation or compressed sensing algorithms. These methods leverage the statistical patterns observed in the measurement outcomes to estimate the underlying quantum state that generated them.

* Shadow tomography provides a means to characterize unknown quantum states in a non-invasive manner, making it particularly useful for quantum systems that are difficult or impossible to access directly. It plays a crucial role in quantum information processing tasks such as quantum error correction, state verification, and characterizing the performance of quantum devices.


*What is gentle measurement in the context of quantum computing?*

* Gentle measurement techniques often employ strategies such as weak measurements, quantum non-demolition measurements, or adaptive measurements.

* Weak measurements involve extracting partial information about the system while causing minimal disturbance.

* Quantum non-demolition measurements are designed to measure one property of the system without affecting other compatible observables.

* Adaptive measurements dynamically adjust the measurement strategy based on the intermediate measurement results to minimize disturbance.

* By using gentle measurement techniques, researchers can gather information about quantum states while preserving their fragile nature. This is particularly important in quantum computing, where maintaining the coherence of quantum bits (qubits) is crucial for performing reliable computations and preventing errors.

###### *Shadow Tomography (States)*

**quantum state tomography**, and it is already an important task in experimental physics. To give some examples, tomography has been used to obtain a detailed picture of a chemical reaction (namely, the dissociation of I2 molecules) [29]; to confirm the preparation of three-photon [28] and eight-ion [20] entangled states; to test controlled-NOT gates [26]; and to characterize optical devices [15].
Physicists would like to scale up tomography to larger systems, in order to study the many-particle entangled states that arise (for example) in chemistry, condensed-matter physics, and quantum information. But there is a fundamental obstacle in doing so. This is that, to reconstruct an n-qubit state, one needs to measure a number of observables that grows exponentially in n: in particular like 4^n, the number of parameters in a 2^n × 2^n density matrix. This exponentiality is certainly a practical problem—Ha ̈ffner et al. [20] report that, to reconstruct an entangled state of eight calcium ions, they needed to perform 656, 100 experiments! But to us it is a theoretical problem as well. For it suggests that learning an arbitrary state of (say) a thousand particles would take longer than the age of the universe, even for a being with unlimited computational power.
Source: The Learnability of Quantum States

quantum shadow tomography, which gives a way of predicting the values of most quantum observables using a small number of samples from a quantum system.

Aaronson S 2018 Shadow tomography of quantum states Proc. of the 50th Annual ACM SIGACT Symp. on Theory of Computing
pp 325–38

https://arxiv.org/abs/1711.01053

> ***Sample complexity of quantum state tomography (e.g. shadow tomography) important for difference between coherent and incoherent measurements***


**what is the shadow tomography problem in quantum information theory?**

In quantum information theory, the shadow tomography problem refers to the task of characterizing an unknown quantum state by gathering statistical information about its measurement outcomes. It addresses the challenge of reconstructing a quantum state without directly accessing it.

The shadow tomography problem arises due to the limitations of directly measuring quantum states. In quantum systems, the act of measurement can disturb the system and potentially cause the collapse of its quantum state. Additionally, some quantum states may be inaccessible or difficult to measure directly due to experimental constraints or limitations.

To overcome these challenges, shadow tomography offers a non-invasive approach to infer information about the unknown quantum state. Instead of directly measuring the state, it focuses on collecting statistical data by repeatedly measuring identically prepared copies of the state and recording the outcomes.

The goal of the shadow tomography problem is to use the collected statistical data to reconstruct an estimate of the quantum state that is consistent with the observed measurement outcomes. This estimation process involves mathematical techniques such as maximum likelihood estimation, compressed sensing, or machine learning algorithms.

Solving the shadow tomography problem is important in various quantum information tasks, including quantum state verification, characterization of quantum devices, and quantum error correction. By obtaining accurate reconstructions of unknown quantum states, researchers can gain insights into the properties of quantum systems and develop techniques to mitigate errors and improve quantum information processing tasks.

**what is the exponential classical cost within the shadow tomography problem?**

The term "exponential classical cost" in the context of the shadow tomography problem refers to the computational complexity associated with classical methods of characterizing or reconstructing an unknown quantum state. It signifies that the amount of computational resources required grows exponentially with the size of the quantum system.

In shadow tomography, the goal is to estimate the quantum state by collecting measurement statistics. The challenge lies in finding an efficient algorithm to reconstruct the state from this statistical data. Classical methods typically involve searching through a large space of possible quantum states and optimizing a likelihood function or another measure of agreement with the observed statistics.

However, for large quantum systems, the number of possible quantum states grows exponentially with the number of qubits. As a result, exhaustive classical methods that explore the entire state space become computationally infeasible as the system size increases. This exponential growth in the number of possible states leads to an exponential increase in the computational cost required for reconstruction.

The exponential classical cost in shadow tomography highlights the need for alternative approaches, such as leveraging quantum computers or developing more efficient reconstruction algorithms. Quantum computers can potentially provide speedup by harnessing the principles of quantum mechanics, allowing for more efficient exploration of the state space or utilizing specialized quantum algorithms tailored for state reconstruction.

Reducing the classical cost of shadow tomography is an active area of research, and various techniques, including compressed sensing, machine learning, and adaptive measurement strategies, have been explored to mitigate the computational challenges associated with large-scale quantum systems.

**what is meant by incoherent and coherent measurements in sample complexity (for shadow tomography)?**

In the context of quantum computing and quantum information theory, coherent and incoherent measurements refer to two different types of quantum measurements.

1. **Coherent measurements**: These are the idealized measurements usually described in basic quantum mechanics where measuring a quantum state immediately collapses it into one of the eigenstates of the measured observable. Coherent measurements are unitary and reversible.

2. **Incoherent measurements**: These measurements, on the other hand, correspond to a more generalized form of measurement, described by a set of positive-operator valued measures (POVMs). POVMs need not be unitary or reversible, and they can model a broader range of real-world measurement processes, including those where the quantum system interacts with an environment and decoheres.

> The difference between coherent and incoherent measurements can become ***important when we consider the sample complexity of quantum state tomography, or in your specific case, shadow tomography.***

Quantum state tomography is the process of determining the unknown state of a quantum system from the outcomes of measurements made on the system. The sample complexity of this process refers to the number of measurements needed to accurately reconstruct the quantum state.

When performing quantum state tomography, incoherent measurements can sometimes offer advantages in terms of reduced sample complexity, since they can provide more information per measurement about the state of the system. However, they may also be more technically challenging to implement than coherent measurements, and they may introduce additional sources of error.

The concept of "shadow tomography" you're referring to might involve using a small number of random measurements (or "shadows") to estimate properties of the quantum state, and it might have different sample complexity depending on whether coherent or incoherent measurements are used. The detailed analysis of this would depend on the specific shadow tomography protocol and the properties of the quantum state and the measurement process.


https://en.wikipedia.org/wiki/Quantum_tomography

https://en.wikipedia.org/wiki/Classical_shadow

Scott Aaronson: Shadow Tomography https://arxiv.org/abs/1711.01053

https://www.spektrum.de/news/quantennetzwerke-zeigen-schwarmintelligenz/2116689

Shadow tomography, also known as X-ray phase-contrast imaging or phase tomography, is a technique used to create detailed three-dimensional images of an object's internal structure. It is primarily employed in the field of medical imaging but also finds applications in materials science and other areas.

Traditional X-ray imaging relies on the attenuation of X-rays as they pass through different tissues or materials, resulting in a two-dimensional projection image that shows variations in X-ray absorption. However, some tissues or materials have similar absorption properties, making it challenging to distinguish between them based solely on their attenuation.

Shadow tomography overcomes this limitation by exploiting the phase shift of X-rays as they pass through an object. When X-rays encounter a boundary between two materials with different refractive indices, they undergo a phase shift. This phase shift contains valuable information about the object's internal structure.

To perform shadow tomography, a coherent X-ray source, such as a synchrotron or an X-ray tube, is used. The X-rays pass through the object and are collected by a detector placed behind it. The collected X-rays contain both intensity and phase information.

The key to shadow tomography lies in extracting the phase information from the intensity measurements. This is typically achieved by using specialized techniques such as interferometry or grating-based methods. These methods create interference patterns that encode the phase information, which can then be computationally reconstructed into a three-dimensional image.

The resulting shadow tomography image provides not only information about the X-ray absorption but also about the object's refractive index distribution and other structural features. This enables visualization of fine details, such as boundaries between tissues or subtle density variations, which may not be easily distinguishable in traditional X-ray images.

Shadow tomography has the potential to improve diagnostic accuracy and enhance our understanding of complex biological and materials systems. However, it currently requires advanced X-ray sources and specialized techniques, limiting its widespread availability in clinical settings. Nevertheless, ongoing research and technological advancements aim to make shadow tomography more accessible and applicable in various fields.

Apologies for the confusion in my previous response. Shadow tomography can indeed be related to the characterization of quantum states, often referred to as "quantum state tomography."

In quantum mechanics, the state of a quantum system is described by a mathematical object known as a quantum state. Quantum state tomography aims to reconstruct or characterize the complete quantum state of a system by performing measurements on the system in different bases or using various observables.

Shadow tomography of quantum states is a specific technique used for quantum state reconstruction. It involves measuring the shadows or projections of the quantum state onto a set of measurement operators or observables. These measurements provide partial information about the quantum state, similar to the way shadows reveal information about the object casting them.

The procedure for shadow tomography of quantum states typically involves preparing the system in different known initial states, applying a set of measurements or observables to the system, and then collecting the measurement outcomes. By repeating this process for a sufficiently large number of initial states and measurements, it becomes possible to reconstruct the complete quantum state of the system through computational methods.

The reconstructed quantum state provides valuable information about the system's properties, such as its purity, entanglement, coherence, and other relevant characteristics. This information is crucial for understanding and characterizing quantum systems, as well as for verifying the performance of quantum devices and protocols.

Quantum state tomography, including shadow tomography, plays a vital role in the development, verification, and optimization of quantum algorithms, quantum error correction, quantum communication, and other applications of quantum information science.

It's important to note that quantum state tomography can be a challenging task due to various factors, such as noise, imperfections in measurements, and the exponential growth of the state space with the number of quantum bits (qubits). Researchers continually work on improving the efficiency and accuracy of quantum state tomography techniques to overcome these challenges.

###### *Quantum Process Tomography*

https://arxiv.org/abs/quant-ph/0702131

**Quantum Tomography?**

Quantum tomography is an experimental procedure to reconstruct a description of part of quantum system from the measurement outcomes of a specific set of experiments. In Qiskit we implement the following types of tomography:

**Quantum state tomography**: Given a state-preparation circuit that prepares a system in a state, reconstruct a description of the density matrix 𝜌
 of the actual state obtained in the system.

**Quantum process tomography**: Given a circuit, reconstruct a description of the quantum channel 
 that describes the circuit’s operator when running on the system.

**Quantum gate set tomography**: Performs process tomography on a set of gates in a self-consistent manner, meaning quantum noises on gates used by the tomography process itself is also taken into account.

This notebook gives examples for how to use the ignis.verification.tomography modules.

Quantum tomography refers to a suite of techniques used in quantum computing and quantum information theory to reconstruct a description of a part of a quantum system based on measurement data. The word "tomography" comes from medical imaging, where it's used to denote techniques that build a 3D image of an object from 2D slices.

In quantum mechanics, there are two main types of tomography: state tomography and process tomography.

1. **Quantum State Tomography**: This is a method used to determine the quantum state of a system. The goal is to construct the density matrix that represents the state of the system. This is done by making a series of measurements on identically prepared quantum systems and using the statistical results of these measurements to reconstruct the state. State tomography is a challenging problem in general, as the number of parameters to be estimated grows exponentially with the number of qubits.

2. **Quantum Process Tomography**: This extends the concept of state tomography to quantum operations or processes. The goal here is to determine the quantum operation that a particular quantum system implements. In other words, given access to a quantum process (which could be a quantum gate, a sequence of quantum gates, or a quantum channel), the task is to identify the corresponding quantum map. This is done by preparing a set of input states, applying the process, and performing state tomography on the outputs. The data are then used to reconstruct the process matrix or Choi matrix, which fully describes the quantum operation. Again, process tomography is also a challenging problem, as the number of parameters to be estimated grows rapidly with the number of qubits.

Quantum tomography is a fundamental tool for quantum characterization, verification, and validation, which are crucial tasks in quantum computing and quantum information processing. However, due to the complexity of these tasks, research is being conducted into more efficient methods, such as compressed sensing and other methods based on assumptions about the structure of the states or processes being measured.

Qiskit:

https://qiskit.org/documentation/stable/0.26/tutorials/noise/8_tomography.html

https://qiskit.org/documentation/tutorials/simulators/2_device_noise_simulation.html

[How do I change the fitter algorithm in a state tomography experiment? - 1 Minute Qiskit](https://www.youtube.com/watch?v=MsENB0n4Ac8)

**Choi matrix = Process Matrix, which fully describes the quantum operation**

The data are then used to reconstruct the process matrix or Choi matrix, which fully describes the quantum operation

https://en.m.wikipedia.org/wiki/Choi%27s_theorem_on_completely_positive_maps

Quantum tomography refers to a suite of techniques used in quantum computing and quantum information theory to reconstruct a description of a part of a quantum system based on measurement data. The word "tomography" comes from medical imaging, where it's used to denote techniques that build a 3D image of an object from 2D slices.

In quantum mechanics, there are two main types of tomography: state tomography and process tomography.

1. **Quantum State Tomography**: This is a method used to determine the quantum state of a system. The goal is to construct the density matrix that represents the state of the system. This is done by making a series of measurements on identically prepared quantum systems and using the statistical results of these measurements to reconstruct the state. State tomography is a challenging problem in general, as the number of parameters to be estimated grows exponentially with the number of qubits.

2. **Quantum Process Tomography**: This extends the concept of state tomography to quantum operations or processes. The goal here is to determine the quantum operation that a particular quantum system implements. In other words, given access to a quantum process (which could be a quantum gate, a sequence of quantum gates, or a quantum channel), the task is to identify the corresponding quantum map. This is done by preparing a set of input states, applying the process, and performing state tomography on the outputs. The data are then used to reconstruct the process matrix or Choi matrix, which fully describes the quantum operation. Again, process tomography is also a challenging problem, as the number of parameters to be estimated grows rapidly with the number of qubits.

Quantum tomography is a fundamental tool for quantum characterization, verification, and validation, which are crucial tasks in quantum computing and quantum information processing. However, due to the complexity of these tasks, research is being conducted into more efficient methods, such as compressed sensing and other methods based on assumptions about the structure of the states or processes being measured.

**what is a quantum state tomography?**

Quantum state tomography is the process of determining the state of a quantum system based on the results of measurements. In essence, it's a method used to figure out the unknown quantum state of a system. It's called "tomography" because, like a CT scan in medicine, it combines many different measurements to construct a full picture of something you can't directly observe in one shot.

For a quantum system, the state is typically represented by a density matrix. In the simplest case, for a single qubit, the density matrix is a 2x2 Hermitian matrix that is positive semi-definite and has trace 1. The generalization to multiple qubits is a matrix that is 2^n x 2^n (where n is the number of qubits), but with the same properties.

Quantum state tomography involves making various measurements on many identically-prepared copies of the quantum system and then using the results to reconstruct the density matrix. The measurements you make are usually in different bases. For a single qubit, measurements are often made in the X, Y, and Z bases, which correspond to the Pauli matrices.

However, as the number of qubits increases, the number of measurements you need to make grows exponentially, which makes full quantum state tomography unfeasible for large systems. For a system of n qubits, the density matrix contains 4^n - 1 independent real parameters, which is why the number of measurements needed increases so rapidly.

Therefore, in practice, people use other methods to get partial information about the state or use assumptions to reduce the number of necessary measurements. There are also methods that attempt to directly estimate quantities of interest, like the expectation value of an observable, without fully reconstructing the state.

**what are these alternative methods to quantum state tomography?**

Indeed, full quantum state tomography is often impractical due to the exponential number of measurements required as the number of qubits increases. Here are some alternative methods that are commonly used in practice:

1. **Quantum Process Tomography**: Instead of trying to determine the state of a quantum system, quantum process tomography aims to determine the quantum operation or quantum channel that describes a quantum process. This can be more practical in many experimental scenarios where you're more interested in characterizing a quantum gate or a quantum circuit.

2. **Direct Estimation of Expectation Values**: Rather than attempting to reconstruct the full quantum state, you can directly estimate the expectation value of a specific observable by making many measurements in the basis corresponding to that observable and averaging the results.

3. **Quantum State Estimation**: Quantum state estimation techniques use a set of projective measurements to estimate the state of a quantum system. This is similar to quantum state tomography, but instead of trying to reconstruct the full state, it only attempts to estimate it based on the results of projective measurements.

4. **Quantum Compressed Sensing**: Compressed sensing is a method that uses the sparsity of a state in a certain basis to reduce the number of measurements required. It has been applied in the context of quantum state tomography, especially for states that are known to be sparse in a certain basis.

5. **Machine Learning Techniques**: Machine learning methods, including neural networks and other techniques, have been increasingly used for quantum state tomography and characterization. These methods can often handle higher-dimensional data and can help reduce the number of measurements required.

6. **Randomized Benchmarking**: This is a method for characterizing the performance of quantum gates by applying sequences of random gates followed by their inverse and seeing how often the system returns to the initial state.

Remember, the choice of method depends on the specific task, the number of qubits, and the resources available. Each method has its strengths and limitations and might be suited to different experimental setups or goals.



###### *Trotterization*

https://arxiv.org/abs/1811.08017

https://web.ma.utexas.edu/users/drp/files/Fall2020Projects/DRP_Final_Project_F2020%20-%20Grant%20E%20Kluber.pdf

https://apctp.org/temp_file/Lecture%2011.pdf

**Trotterization**

https://vtomole.com/blog/2019/04/07/trotter

Several quantum Krylov algorithms have been proposed over the last few years, with the majority focusing on creating the Krylov space from time-evolved states with the given Hamiltonian. These methods are especially appealing for existing quantum computers, since one can obtain crude approximations of time evolutions using low-depth quantum circuits (i.e., Trotterization).

https://medium.com/qiskit/a-mainstay-of-classical-ground-state-algorithms-gets-a-quantum-speedup-8003687dbb0d

**Trotterization** is a technique used in quantum computing to simulate the evolution of a quantum system under a Hamiltonian that is a sum of many terms. The idea is to break down the evolution into a sequence of smaller evolutions, each of which is under a single term of the Hamiltonian. This can be done because the exponential of a sum of matrices is approximately equal to the product of the exponentials of the individual matrices, up to a small error.

The Trotterization formula is:

> $e^{-iHt} \approx \left( e^{-iH_0t/r} \cdot e^{-iH_1t/r} \cdots e^{-iH_{d-1}t/r} \right)^r$

where $H$ is the Hamiltonian, $t$ is the time, $r$ is the number of Trotter steps, and $H_i$ are the individual terms of the Hamiltonian.

The error in the Trotterization approximation can be controlled by increasing the number of Trotter steps. However, this also increases the complexity of the quantum circuit that is needed to implement the Trotterized evolution.

Trotterization is a powerful technique that can be used to simulate a wide variety of quantum systems. However, it is important to note that the error in the Trotterization approximation can be significant, especially for long times or for Hamiltonians with terms that do not commute.

Here are some additional details about Trotterization:

* The Trotterization formula was first introduced by Harold Trotter in 1958.
* The error in the Trotterization approximation is of the order of $O(t^2/r^2)$.
* The number of Trotter steps needed to achieve a given accuracy depends on the Hamiltonian and the desired accuracy.
* Trotterization can be used to simulate the evolution of both closed and open quantum systems.
* Trotterization is a key technique in many quantum algorithms, such as quantum simulation, quantum chemistry, and quantum machine learning.

###### *Qubitization*

**Qubitization**

"Qubitization" is a term related to quantum computing. It is a procedure to convert a certain operation into a quantum operation that can be performed on a quantum computer. Qubitization offers an approach for performing certain computational tasks with a significant reduction in resource requirements compared to other quantum algorithms.

A primary example of the use of qubitization is in simulating Hamiltonians, a fundamental problem in quantum physics, with an algorithm known as the "qubitization of Hamiltonian." This algorithm allows the efficient estimation of quantities such as ground state energy, which are of significant interest in quantum chemistry and condensed matter physics.

The key advantage of qubitization over other quantum simulation algorithms is that it allows a more efficient use of quantum resources. However, the technique typically requires a more complex set of quantum gates and, as of my knowledge cut-off in September 2021, is a subject of active research and development in the field of quantum computing.

Please note that the understanding and implementation of qubitization may have evolved beyond my last update. For the most accurate information, it's recommended to refer to the latest literature in the field of quantum computing.

https://arxiv.org/abs/1610.06546

###### *QRAM*

For example, it points out that high performance parallel systems like TPUs tend to keep the ratio of CPUs to memory in a range as opposed to going all memory or all CPU. And the same kinds of tradeoffs apply to quantum memory, especially if it's error corrected which requires CPUs. It's just sort of agglomerating all these arguments we have against QRAM as a physical device that doesn't use error correction.

https://arxiv.org/abs/2305.10310

###### *Block Encoding (Gibbs-Sampling)*

https://arxiv.org/abs/1806.01838

Quantum singular value transformation and beyond

aaronson

https://arxiv.org/abs/2203.10236

**Block Encoding**

Block encoding is the framework or tool for developing a variety of quantum algorithms by encoding a matrix as a block of a unitary.There are various ways to implement the same as per requirement,one of the ways is by decomposing the matrices into linear combinations of displacement matrices.


block encodings (unitary, signal proicesing, transformation)

https://quantumcomputing.stackexchange.com/questions/18236/block-encoding-technique-what-is-it-and-what-is-it-used-for

https://de.wikipedia.org/wiki/Gibbs-Sampling

Block encoding is the framework or tool for developing a variety of quantum algorithms by encoding a matrix as a block of a unitary.There are various ways to implement the same as per requirement,one of the ways is by decomposing the matrices into linear combinations of displacement matrices.

Arxiv: [Quantum singular value transformation and beyond: exponential improvements for quantum matrix arithmetics](https://arxiv.org/abs/1806.01838)

In HHL f(x) = 1/x. Use Singular Value Transformation to approximate it! https://www.youtube.com/watch?v=L40UUDxPEbE&list=WL&index=3&t=215s

https://quantumcomputing.stackexchange.com/questions/18197/in-the-context-of-block-encoding-what-does-0-rangle-otimes-i-represent

https://quantumcomputing.stackexchange.com/questions/18236/block-encoding-technique-what-is-it-and-what-is-it-used-for

*(2021) Quantum **Krylov subspace** algorithms for ground and excited state energy estimation*

https://arxiv.org/abs/2109.06868

https://en.m.wikipedia.org/wiki/Krylov_subspace


https://quantum-journal.org/papers/q-2023-05-23-1018/

We present an algorithm that uses block encoding on a quantum computer to exactly construct a Krylov space, which can be used as the basis for the Lanczos method to estimate extremal eigenvalues of Hamiltonians. While the classical Lanczos method has exponential cost in the system size to represent the Krylov states for quantum systems, our efficient quantum algorithm achieves this in polynomial time and memory. The construction presented is exact in the sense that the resulting Krylov space is identical to that of the Lanczos method, so the only approximation with respect to the exact method is due to finite sample noise. This is possible because, unlike previous quantum Krylov methods, our algorithm does not require simulating real or imaginary time evolution. We provide an explicit error bound for the resulting ground state energy estimate in the presence of noise.

https://medium.com/qiskit/a-mainstay-of-classical-ground-state-algorithms-gets-a-quantum-speedup-8003687dbb0d

###### *Resource Estimation for Quantum Algorithms*

https://quantumcomputing.stackexchange.com/questions/21518/review-paper-on-depth-qubits-and-t-gates-number-on-cliffordt-decomposition-f

from Table 1 in arxiv.org/abs/2012.03819, pricing commonly used non-trivial options (autocallables and TARFs) using Monte Carlo simulations on a quantum computer takes ~10 billion gates

https://arxiv.org/abs/2012.03819: A Threshold for Quantum Advantage in Derivative Pricing



###### *Quantum Applications*

The Anatomy of Quantum Advantage

The best known quantum advantage comes from the quantum Fourier transform, used in both Shor's and Grover's algorithms. Here is why.

The Fourier transform approximates a given periodic signal with simpler periodic signals with a given set of frequencies. It is used in image, video, and radio transmissions, to name just a few areas we interact with everyday. As the image shows,  if we apply the (inverse) Fourier transform to a signal that has a frequency of 4.7 we get increasingly better approximations for its frequency. With three qubits, we have 8 frequencies available (0 through 7). The first step indicates that the frequency is in the second half of the range (those frequencies have a higher probability). The second step assigns a higher probability to the first half of the second half (4 and 5). The third and last step gives more probability to 5, with the ratios of the probabilities of 4 and 5 being correlated with the decimal part of 4.7.

This is an amazing procedure, a probabilistic version of a binary search (for an integer frequency it acts like a binary search, as I pointed out in a previous post).

The fast Fourier transform (FFT) is a version of the algorithm that is extremely efficient. If you look at its implementation, it pretty much makes every step look like a quantum gate (a combination of Hadamard and Phase gates). Its structure is the essence of how quantum computing works. Even better, the quantum implementation takes advantage of quantum parallelism, enabled by decomposing operations by digit/qubit, to make the transformation exponentially faster.

It is worth understanding the details of how the advantage works. Decomposing by digits is an essential ingredient, that coupled with interference allows quantum parallelism to provide an exponential speed-up. The combinatorial structure is the key, and the place where other advantage sources may be found in the future. I also think the discrete sinc distribution can play a bigger role in quantum computing.

We are looking at the end of the German manufacturing system in less than 2 years


![geometry](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_1642.png)

[(2023) Quantum Computing for High-Energy Physics: State of the Art and Challenges](https://arxiv.org/abs/2307.03236)

What Is an Over-the-Counter (OTC) Derivative?
According to Investopedia, an over-the-counter (OTC) derivative is a financial contract that does not trade on an asset exchange, and which can be tailored to each party's needs.

A derivative is a security with a price that is dependent upon or derived from one or more underlying assets. Its value is determined by fluctuations in the underlying asset.

Find out how our unique and proprietary Quantum Neural Network methods apply to derivative pricing ⤵

https://www.pasqal.com/industry/finance

**Quantum Effects in Everyday Life**

*Casimir effect from quantum (zero point) energy* [Source](https://www.youtube.com/watch?v=Rh898Yr5YZ8)

  * Geckos are climbing with use of Van der Vals forces - essentially same things as Casimir force: Gekko's feet with Setae (microscopic hairs)**

  * These hairs split into millions of spatula shaped ends that are around point 2 micrometers in diameter

  * when a gecko presses its cute little feet onto any surface a Fraction of these hairs are close enough to the surface so that Casimir forces come into effect

  * Gecko is literally manipulate quantum vacuum energy to climb walls

*Quantum Tunneling* [Source](https://youtu.be/1zDzbXbt7Hc?t=537)

  * Microchips: Quantum tunneling of electrons through barrier of less than 1 nanometer. affects minimum size of electronic components in microchips

  * *Electrons in the process of photosynthesis via quantum tunneling (Hypothesis)**

  * Quantum tunneling maybe responsbile for aging and cancer (Hypothesis)

* Laser: in fiber optic cables https://youtu.be/Gbfj_FlEjI4

**Chemistry**

* https://www.spektrum.de/news/ki-findet-neue-struktur-von-wasserstoff/2132715

**Astronomy**

* Optical interferometry

**Finance**

Cosa è stato detto: I domini di impiego più rilevanti del Quantum Computing nell’ambito dei servizi finanziari riguardano
* Ottimizzazione dei portafogli titoli, ad esempio, per massimizzarne la rendita e minimizzarne la volatilità.
* Simulazioni di scenario per il calcolo del pricing di un certo asset o per fare previsioni di mercato più accurate.
* Machine learning per analizzare elevati flussi di dati più velocemente e con una maggior precisione.
* Cybersecurity per mitigare la minaccia di un attacco quantistico ai protocolli crittografici impiegati per proteggere le transazioni bancarie.

**Learning Material**

Online Course: [Quantum Computing](https://youtube.com/playlist?list=PLnK6MrIqGXsL1KShnocSdwNSiKnBodpie)

https://www.youtube.com/watch?v=Kk2zMpDaPgs&list=WL&index=8

https://youtu.be/mD6m-XkDgTY

https://youtu.be/SIehCk3WQEg

https://youtu.be/JpRO7yXbFYE

https://youtu.be/gY5lisUprLg

https://youtu.be/XzcTb11CTgE

https://youtu.be/Pq2DVlru0UI

https://youtu.be/i-sCFVt52bI

https://youtu.be/40OFnZTPqyc

[Efficient quantum algorithm for solving travelling salesman problem: An IBM quantum experience](https://arxiv.org/abs/1805.10928)

> **Quantum Sensing**

Terrestrial Very-Long-Baseline Atom Interferometry

https://indico.cern.ch/event/1208783/

https://webcast.web.cern.ch/event/i1208783

Two extensive works on quantum metrology.

https://phys.au.dk/fileadmin/user_upload/Phd_thesis/AlexanderHolmKiilerich_thesis.pdf

https://www.tdx.cat/bitstream/handle/10803/371132/bgc1de1.pdf?sequence=1

 am excited to share that our work on sensitive detection of spins using superconducting circuits has been published in Science Advances! Spin resonance spectroscopy is an important technique in fields as diverse as materials science, physics, chemistry and biology. In this study we demonstrate a device capable of beating the spin detection sensitivity of conventional spectrometers by many orders of magnitude. The performance matches previous demonstrations using superconducting quantum technologies, but importantly integrates the core components into a single device and works at higher magnetic fields and temperatures, which is critical for the practical use of this technology. See the paper and a brief explainer below.

Wyatt Vine, Mykhailo S., Arjen Vaartjes, Anders Kringhøj, James Slack-Smith, Thomas Schenkel, Brett Johnson, Andrea Morello

https://lnkd.in/gf5-h8Bc

https://lnkd.in/g-Rsp6Kb

[Quantum Sensing](https://en.m.wikipedia.org/wiki/Quantum_sensor) **Differenzierung**

* [Interferometry](https://en.m.wikipedia.org/wiki/Interferometry): technique which uses the interference of superimposed waves to extract information

  * [Atom Interferometer](https://en.m.wikipedia.org/wiki/Atom_interferometer): interferometer which uses the wave character of atoms.

  * Electromagnetic Wave Interferometry

    * [Astronomical_optical_interferometry](https://en.m.wikipedia.org/wiki/Astronomical_optical_interferometry)


**Astronomical Optical Interferometry**

* https://www.quantamagazine.org/famous-quantum-experiment-offers-hope-for-earth-size-telescope-20210505/
  * radio interferometry: relatively easy in radio astronomy, both because radio-emitting objects tend to be extremely bright, and because radio waves are relatively large and thus easy to line up.
  * Optical interferometry is much harder. Visible wavelengths measure hundreds of nanometers long, leaving far less room for error in aligning waves according to when they arrived at different telescopes. Moreover, optical telescopes build images photon-by-photon from very dim sources. It’s impossible to save these grainy signals onto normal hard drives without losing information that’s vital for doing interferometry.
  * “If there was a way of recording photon events at an optical telescope with some kind of quantum device, that would be a great boon to the science.”
* https://en.m.wikipedia.org/wiki/Astronomical_optical_interferometry
* https://en.wikipedia.org/wiki/Photometric_system
* https://en.wikipedia.org/wiki/List_of_astronomical_interferometers_at_visible_and_infrared_wavelengths

**Appendix**

sensing changes in motion, and electric and magnetic fields. The analyzed data is collected at the atomic level.

https://www.baesystems.com/en-us/definition/what-is-quantum-sensing

https://en.m.wikipedia.org/wiki/Quantum_sensor

https://en.m.wikipedia.org/wiki/Quantum_metrology

https://en.m.wikipedia.org/wiki/Quantum_imaging

diamond-based quantum sensors that can detect magnetic and electric fields
* https://www.iaf.fraunhofer.de/en/researchers/quantum-systems/quantum-sensors.html
* https://www.iaf.fraunhofer.de/en/media-library/press-releases/world-s-first-measurement-of-magnetic-field-dependent-stimulated.html
* https://www.iaf.fraunhofer.de/en/media-library/press-releases/more-precise-diagnoses-and-personalized-therapies-due-to-hyperpolarized-nuclear-magnetic-resonance.html

**Quantum Sensing**

https://www.cosmos-indirekt.de/News/Neuartiger_Quantenzustand_durch_exotisches_Wechselspiel_der_Elektronen.html

https://singularityhub.com/2022/10/31/new-3d-quantum-accelerometer-leaves-classical-sensors-in-the-dust/

https://de.m.wikipedia.org/wiki/Beschleunigungssensor

Quantum sensing technology is capable of creating new measurement principles or sitting on top of conventional measurement technology and upgrading them (either as an component or theoretical concept).
This paper written by Mr. Reichert et. al. in pj Quantur Information is a prime example of such improvements!

Finance: https://www.mckinsey.com/industries/financial-services/our-insights/how-quantum-computing-could-change-financial-services

A review on **classical machine learning methods to characterize quantum systems**, stemmed from a wide collaboration of experts in the field, is now out in its final revision on Nat Rev Phys! Very proud to have been part of the team :)

https://arxiv.org/pdf/2207.00298.pdf

https://www.nature.com/articles/s42254-022-00552-1

Applications: **Detecting dark matter with quantum computers**

* When dark matter particles traverse a strong magnetic field, they may produce photons that Chou and his team can measure with superconducting qubits inside aluminum photon cavities. Because the qubits have been shielded from all other outside disturbances, when scientists detect a disturbance from a photon, they can infer that it was the result of dark matter flying through the protective layers.

* “These disturbances manifest as errors where you didn’t load any information into the computer, but somehow information appeared, like zeroes that flip into ones from particles flying through the device,” he said.

https://news.fnal.gov/2022/12/detecting-dark-matter-with-quantum-computers/

Applications: **Generative Machine Learning with Classical Data**

* Quantum Advantage with QML Models:

* GANs (famous for collapsing) and VAE (issue of memorizing data and overfitting)

* Probabilistic generative models seems a more prominent candidate for quantum advantage.

* https://www.youtube.com/watch?v=uGpbP1tRDyQ&list=WL&index=3&t=465s

* https://arxiv.org/abs/1708.09757

* problem: lack of general metric (also in classical ML), for example "inception score" only works for images

* Many of the evaluation techniques, e.g., divergences, favor memorization
  - SOA models, e.g., VAE and GAN don't possess tractable likelihood functions.
  - Scales badly in high dimensions
Obscures distinct mode of failure into a single (uninterpretable) number

* Possible solution: https://arxiv.org/abs/2201.08770 Evaluating Generalization in Classical and Quantum Generative Models

* https://learnopencv.com/variational-autoencoder-in-tensorflow/

* VAE in Astrophysics: We propose to use generative models based on deep neural networks, namely variational autoencoders (VAE), to learn probabilistic models directly from data. We train a VAE on images of centred, isolated galaxies, which we reuse, as a prior, in a second VAE-like neural network in charge of deblending galaxies. https://academic.oup.com/mnras/article/500/1/531/5919458

###### *Exponential vs Quadratic Speedups*

https://arxiv.org/abs/2011.04149

###### *Quantum Topological Data Analysis*

**Essential Steps in the qTDA Paper**

* **Connect Laplacian with TDA**:
  * harmonic part of the spectrum (eigenvalues, multiples of zero-th eigenvalue) of the persistent Laplacian fully captures the homological persistence and can thus be used to compute the persistent Betti numbers of filtrations of simplicial complexes. (harmonic part of a spectrum is the set of frequency components whose frequencies are whole number multiples of the fundamental frequency)
  * The eigenvalues of the combinatorial Laplacian can be used to measure the topological features of the data. For example, the smallest non-zero eigenvalue of the Laplacian is related to the number of connected components in the data. The second smallest non-zero eigenvalue is related to the number of holes in the data.
  * Specifically the **dimension of the kernel of the combinatorial Laplacian for k simplices is the kth Betti number**. (*The kernel is a function that measures the similarity between two data points). In other words, the number of data points that are in the kernel of the Laplacian is equal to the number of holes in the data set.
  
* The kernel of the combinatorial Laplacian specifically captures connectedness.: The kernel (or null space) of the Laplacian captures the number of connected components in the graph. The combinatorial Laplacian is defined for a graph \( G \) with vertex set \( V \) and edge set \( E \) as \( L = D - A \), where \( D \) is the degree matrix and \( A \) is the adjacency matrix. **The number of zero eigenvalues of the combinatorial Laplacian equals the number of connected components of the graph**. The eigenvectors associated with the zero eigenvalues (which form the kernel of the Laplacian) provide **harmonic functions on the graph**.

* Aber: combinatorial laplacian vs persistent laplacian: Persistent homology provides information at multiple scales, while the kernel of the combinatorial Laplacian is typically focused on the connected components of a fixed graph.)

* The persistent Laplacian is a graph signal processing tool that can be used to **analyze the harmonic part of a spectrum** (=*harmonic part of a spectrum is the set of frequency components whose frequencies are whole number multiples of the fundamental frequency. These frequencies are known as harmonics. harmonic part of the spectrum is then the set of eigenvalues that remain after all the small eigenvalues have been removed*). The persistent Laplacian is calculated by **taking the Laplacian of a graph, and then iteratively removing the smallest eigenvalues of the Laplacian**.
  * The harmonic part of the spectrum is then the set of eigenvalues that remain after all the small eigenvalues have been removed
  * this is why we were using the **Chebyshev polynomial to optimally filter the zero eigenvalues** (spectral projector that distinguishes the zero eigenvalue from the remaining nonzero eigenvalues of the Dirac operator)
  * ps: spectral projector is a projection operator that projects onto the eigenspace of a self-adjoint operator corresponding to a particular eigenvalue because if you project onto the eigenspace of a matrix corresponding to a particular eigenvalue, you can develop efficient algorithms for solving systems of linear equations, or if you project onto the eigenspace of the Laplacian corresponding to a particular eigenvalue, you can study the behavior of solutions to the differential equation in the long-time limit.

**Kernel and Spectral Gap**

* there is a connection between kernel and spectral gap. The spectral gap is a measure of how well-separated the eigenvalues of a matrix are. A large spectral gap means that the eigenvalues are well-separated, which means that the matrix is more stable and easier to invert.

* The kernel of a matrix is the set of all vectors that are orthogonal to all the eigenvectors of the matrix with eigenvalue 0. In other words, it is the set of all vectors that do not change under the action of the matrix.

* The spectral gap of a matrix is related to the kernel of the matrix in the following way: **the larger the spectral gap, the smaller the dimension of the kernel**. This is because if the spectral gap is large, then the eigenvalues of the matrix are well-separated, which means that there are fewer vectors that are orthogonal to all the eigenvectors with eigenvalue 0.

First Eigenvalue in TDA:
* „In ordinary TDA applications one is typically concerned with the computation of the exact count of zero eigenvalues of combinatorial Laplacians“
* = number of connected compontents (you cannot break it up any smaller).
* „deciding if a combinatorial Laplacian has a trivial or non-trivial kernel (i.e., Betti number zero or non-zero)“

Cheeger’s inequality demonstrates that the magnitudes of the small non-zero eigenvalues of the graph Laplacian characterises the connectedness of the graph [20], and similar results hold for combinatorial Laplacians.

Higher Eigenvalues and why filter out first Eigenvalue:
* Compute spectral gap, large gap = dense graph connectivity
* Filtrations in Persistent Homology: In TDA, one often constructs filtrations, which are sequences of growing simplicial complexes. Monitoring how the first non-zero eigenvalue changes throughout a filtration can give insights into how the topological features of the data evolve across scales.
* Robustness and Sensitivity: The magnitude of the first non-zero eigenvalue can indicate the robustness or sensitivity of the data's topological features. Smaller values might suggest features that are more sensitive to noise or perturbations.
* Dimensionality Reduction: Techniques like Laplacian eigenmaps use the first few eigenvalues and their associated eigenvectors for dimensionality reduction. The first non-zero eigenvalue plays a critical role in capturing the most significant variance or structure in the data in such methods.
* Clustering and Segmentation: The first non-zero eigenvalue and its corresponding eigenvector can be useful in clustering or segmenting data. Methods like spectral clustering leverage this property.


[Hodge theory](https://de.m.wikipedia.org/wiki/Hodge-Zerlegung): combinatorial laplacian (linear operator) which is a generalization of graph laplacian
* The key idea in Hodge theory is that the cohomology groups of a smooth manifold **can be decomposed into a direct sum of spaces of harmonic forms.**
* A harmonic form is a differential form that satisfies the Laplace equation. The Laplace equation is a partial differential equation that arises naturally from the study of differential forms.
* This theorem has many important consequences, such as the Hodge theorem, which states that the integral of a harmonic form over a closed manifold is zero.

*Homology vs Hohomolgy*

Homology and cohomology are two closely related concepts in mathematics that are used to study the topology of a space. They are both defined using chain complexes, but they differ in the way that the chain complexes are constructed.

In homology, the chain complex is constructed using simplices, which are geometric objects that can be thought of as building blocks for a space. The cochain complex is constructed using cochains, which are functions on the simplices.

**The homology groups of a space are the groups of equivalence classes of cycles in the chain complex**. A cycle is a collection of simplices that can be assembled to form a closed loop. The cohomology groups of a space are the groups of equivalence classes of cocycles in the cochain complex. A cocycle is a function on the simplices that satisfies certain conditions.

The main difference between homology and cohomology is that homology groups are contravariant functors, while cohomology groups are covariant functors. This means that if you take the opposite of a space, the homology groups of the opposite space are the same as the cohomology groups of the original space.

Intuitively, homology can be thought of as a way of counting the number of holes in a space, while cohomology can be thought of as a way of measuring the amount of "stuff" in a space.

Here is a table that summarizes the key differences between homology and cohomology:

| Feature | Homology | Cohomology |
|---|---|---|
| Construction | Simplicial chain complex | Cochain complex |
| Equivalence classes | Cycles | Cocycles |
| Functor type | Contravariant | Covariant |
| Interpretation | Holes in a space | Stuff in a space |

I hope this helps! Let me know if you have any other questions.

*Quantifying Quantum Advantage in Topological Data Analysis*

https://arxiv.org/abs/2209.13581

QIP2023 | Quantifying Quantum Advantage in Topological Data Analysis (Vedran Dunjko)
https://youtu.be/T8ygjc-Lpn0

What means in the context of quantum computing: „Block encoding & qubitization instead of time evolution. Use filtering of the zero eigenvalue rather than phase estimation.“

To understand this, let's first break down some key concepts in quantum computing:

1. **Block Encoding**: In quantum computing, block encoding is a method of representing a matrix in a larger, unitary matrix so that the original matrix becomes a sub-block of this larger matrix. This technique is used to allow us to perform certain operations more easily.

2. **Qubitization**: Qubitization is a quantum algorithm technique, which uses the block-encoding of a matrix to perform a quantum walk on it. The method uses significantly fewer resources than other quantum algorithms for similar problems.

3. **Time Evolution**: Time evolution in quantum computing typically refers to how a quantum state changes over time according to the Schrödinger equation. It is often used to simulate physical systems in quantum simulations.

4. **Phase Estimation**: Quantum Phase Estimation (QPE) is a fundamental algorithm in quantum computing which estimates the phase (or eigenvalue) of an eigenvector of a unitary operator. It's a key component in many quantum algorithms, such as Shor's algorithm and quantum simulation algorithms.

5. **Filtering the Zero Eigenvalue**: This refers to the process of isolating or eliminating states with zero eigenvalues. It can be a way of refining results or simplifying calculations in quantum algorithms.

Now, to the context you provided: "Block encoding & qubitization instead of time evolution. Use filtering of the zero eigenvalue rather than phase estimation."

This essentially suggests a strategy for tackling a quantum computing task. Instead of using the time evolution operator (a more traditional approach to quantum simulations), this is advocating the use of block encoding and qubitization. These methods can provide a more resource-efficient approach for certain types of quantum computations.

Similarly, "using filtering of the zero eigenvalue rather than phase estimation" suggests a preferential way of isolating the states you're interested in. While phase estimation is a powerful tool, it can be resource-intensive. Filtering the zero eigenvalue might be a more efficient approach in certain contexts, particularly when zero eigenvalues represent "unwanted" states that might complicate or slow the computation.

**Exkurs: how can you use the Chebyshev polynomial to optimally filter the zero eigenvalues?**

The Chebyshev polynomials are a family of orthogonal polynomials that can be used to filter the zero eigenvalues of a matrix. The Chebyshev polynomials are defined by the following recursion:

$$T_0(x) = 1$$
$$T_1(x) = x$$
$$T_n(x) = 2xT_{n-1}(x) - T_{n-2}(x)$$

for all positive integers $n$.

The Chebyshev polynomials have the property that they are orthogonal to the constant function on the interval $[-1,1]$. This means that for any two Chebyshev polynomials $T_n(x)$ and $T_m(x)$, the following inner product is zero:

$$\int_{-1}^1 T_n(x) T_m(x) \, dx = 0$$

where $n \neq m$.

This property can be used to filter the zero eigenvalues of a matrix $A$. The idea is to multiply $A$ by a Chebyshev polynomial $T_n(x)$, where $n$ is large enough so that the Chebyshev polynomial is non-zero at all eigenvalues of $A$ except for the zero eigenvalues. The resulting matrix will have all of its eigenvalues except for the zero eigenvalues scaled by $T_n(x)$. The zero eigenvalues will be filtered out because they will be multiplied by 0.

The optimal value of $n$ for filtering the zero eigenvalues depends on the matrix $A$. A good choice for $n$ is the smallest integer such that $T_n(x)$ is non-zero for all eigenvalues of $A$ except for the zero eigenvalues.

**The Chebyshev polynomial filter is a powerful tool for filtering the zero eigenvalues of a matrix**. It is simple to implement and it is very effective. The Chebyshev polynomial filter has been used in a variety of applications, including image processing, machine learning, and numerical analysis.

<font color="blue">***Kernel of Combinatorial Laplacians and Betti numbers***

> The kernel of the combinatorial Laplacian is isomorphic to the first homology group of the simplicial complex. This means that the dimension of the kernel of the Laplacian is equal to the number of connected components of the simplicial complex.

**Persistent homology and the kernel of a combinatorial Laplacian** both arise in the study of topological and geometric properties of data, but they address different aspects and are based on somewhat different concepts.

*Kernel of a Combinatorial Laplacian:*

1. **Purpose**: The combinatorial Laplacian (or graph Laplacian) is a matrix associated with a graph that gives insights into the properties and structure of the graph. The kernel (or null space) of the Laplacian captures the number of connected components in the graph.

2. **Construction**: The combinatorial Laplacian is defined for a graph \( G \) with vertex set \( V \) and edge set \( E \) as \( L = D - A \), where \( D \) is the degree matrix and \( A \) is the adjacency matrix.

3. **Output**: The number of zero eigenvalues of the combinatorial Laplacian equals the number of connected components of the graph. The eigenvectors associated with the zero eigenvalues (which form the kernel of the Laplacian) provide harmonic functions on the graph.

*Commonalities:*

1. **Topology**: Both are tools used to study topological or geometric properties of spaces (or data). While persistent homology studies a wide range of topological features across scales, <font color="blue">the kernel of the combinatorial Laplacian specifically captures connectedness.</font>

2. **Graph-based**: Both can be discussed in the context of graphs or networks. Simplicial complexes in persistent homology can be viewed as higher-dimensional generalizations of graphs.

*Differences:*

1. **Scale**: Persistent homology provides information at multiple scales, while the kernel of the combinatorial Laplacian is typically focused on the connected components of a fixed graph.

2. **Features**: While both can capture the number of connected components, persistent homology can also capture other features like loops and voids.

3. **Mathematical machinery**: Persistent homology involves techniques from algebraic topology and often requires more sophisticated computational tools. The combinatorial Laplacian involves spectral graph theory and linear algebra.

4. **Applications**: While both are used in data analysis, persistent homology has found a broader range of applications in studying the shape of data, while the Laplacian and its kernel are often used in graph theory, network analysis, and spectral clustering.

In summary, while both persistent homology and the kernel of the combinatorial Laplacian deal with topological features of spaces or data, they focus on different aspects and are based on somewhat different mathematical concepts.

* The persistent Laplacian (persistent homology) is a generalization of the combinatorial Laplacian that can be used to compute persistent Betti numbers. Persistent Betti numbers are a way to track the birth and death of topological features in a simplicial complex as a function of a parameter. The persistent Laplacian is closely related to the kernel of the combinatorial Laplacian, and it can be used to compute persistent Betti numbers in a more efficient way.

* persistent Laplacian, an extension of the standard combinatorial Laplacian to the setting of pairs: https://epubs.siam.org/doi/10.1137/21M1435471



Computing Betti Numbers via Combinatorial Laplacians

https://dl.acm.org/doi/pdf/10.1145/237814.237985

omputing homology at present seems hard; currently algorithms require computing the [Smith normal form](https://en.m.wikipedia.org/wiki/Smith_normal_form) of the matrix. The latter can be done in polynomial time
[KB79]), but the current polynomial required is still quite large.

Normal form: https://en.m.wikipedia.org/wiki/Canonical_form

**why is the kernel also the Betti number in topological data analysis?**

* The kernel in topological data analysis is not the same as the Betti number. **The kernel is a function that measures the similarity between two data points**. The Betti number is a topological invariant that measures the number of holes in a data set.

> However, there is a connection between the kernel and the Betti number. **The dimension of the kernel of the combinatorial Laplacian for k simplices is the kth Betti number. In other words, the number of data points that are in the kernel of the Laplacian is equal to the number of holes in the data set.**

* This connection between the kernel and the Betti number can be used to develop topological data analysis methods that are based on kernel methods. Kernel methods are a type of machine learning method that uses kernels to measure the similarity between data points. By using the kernel of the Laplacian, we can develop kernel methods that are sensitive to topological features in data.

* For example, we can use the kernel of the Laplacian to develop a kernel-based clustering algorithm that can cluster data points based on their topological features. We can also use the kernel of the Laplacian to develop a kernel-based classification algorithm that can classify data points based on their topological features.

* The connection between the kernel and the Betti number is a powerful tool for topological data analysis. It allows us to use kernel methods to develop machine learning algorithms that are sensitive to topological features in data. This can be helpful for a variety of tasks, such as clustering, classification, and anomaly detection.

**what is the kernel of the combinatorial Laplacian?**

* The kernel of the combinatorial Laplacian is the set of all vectors in the vector space that are orthogonal to all the eigenvectors of the Laplacian with eigenvalue 0. In other words, it is the set of all vectors that do not change under the action of the Laplacian.

* The kernel of the combinatorial Laplacian is important in topological data analysis because it is isomorphic to the homology groups of the underlying simplicial complex. This means that the kernel can be used to determine the Betti numbers of the simplicial complex, which are topological invariants that measure the number of connected components, holes, and voids in the complex.

* For example, the kernel of the combinatorial Laplacian of a circle is one-dimensional, which corresponds to the fact that the circle has one Betti number, namely $\beta_1=1$. This is because the only vector in the kernel of the Laplacian is the constant vector, which corresponds to the fact that the circle is a connected topological space.

* In general, the dimension of the kernel of the combinatorial Laplacian is equal to the sum of the Betti numbers of the simplicial complex. **This means that the kernel can be used to count the number of connected components, holes, and voids in a simplicial complex.**

* **The kernel of the combinatorial Laplacian can also be used to define a graph kernel, which is a similarity measure between graphs.** The graph kernel is based on the idea that two graphs are similar if their kernels are similar. This kernel has been shown to be effective for a variety of tasks in machine learning, such as graph classification and clustering.


###### *Quantum supremacy using a programmable superconducting processor*

80 qubits: fidelyt decreases

https://www.nature.com/articles/s41586%20019%201666%205

* Original experiment from Aaronson: boson sampling. Google: random circuit sampling (RCS). In both it is exponentially hard to compute the output distribution (also for quantum -> limits of quantum computing). But a quantum computer can sample efficiently from this distribution
* Challenge: you can‘t go to 80 qubits or so, because then it is impossible for a classical computer to verify. Google used 53 qubits, because then you can at least verify (with 10,000 years on classical computer, doing extrapolation). But this opens the door to spoof the results classically (-> what did aaronson mean by that?)
* Google did two things badly:
    * They didn‘t properly estimate the 10.000 years on a supercomputer (breaking down and then extrapolate)
    * google didn‘t use the best classical algorithm to compare with: later people used tensor networks to speed up the classical calculation and now it‘s down to 1 few seconds.
* But this raises a question of what means supremacy: in terms of electrticity usage, quantum is still exponentially better than classical in this example: kilowatt vs megawatt or so
* In terms of clockspeed though, classical is equally fast for this expoeriement, because classical computing is highly parallelizable (many operations in parallel)


spoofing: technical term, do something fraudently, how do i know that they ran the computation and got a result?
easy to sample random bits from porter thomas distribution.

page 3: https://www.cs.cmu.edu/~odonnell/quantum18/lecture25.pdf

for RCS we dont have an euqivalent way of verfying this? KL divergence between what it should look like

spoofing: clasical ciruti doesnt smiluate, but does anything up to simulaotyon. it can fool a verifyer.

###### *Quantum-inspired Algorithms, Tensor Networks and QAOA*

https://arxiv.org/abs/1807.04271

Ewan Tang is known for finding a classical algorithm that solves the problem of recommendation systems as efficiently as the best-known quantum algorithm. The quantum version of the problem was solved using an algorithm called Quantum Singular Value Transformation (QSVD). The classical version, however, had no known efficient algorithm, until Tang's work.

Ewan Tang's dequantization approach takes advantage of the structure of the QSVD algorithm. He noted that the QSVD operates on sparse, low-rank matrices, and developed a classical algorithm that could handle this structure efficiently.

Here's a simplified overview of Tang's algorithm:

1. **Input Data Conversion**: Transform the input data into a sparse, low-rank format. In the context of recommendation systems, this might mean constructing a matrix where the rows correspond to users, the columns correspond to products, and each entry reflects the user's preference for the product.

2. **Sampling Procedure**: Use a randomized sampling procedure to select a small number of "landmark" columns (products). The selection probability for each column is proportional to its contribution to the total preference scores.

3. **Low-Rank Approximation**: Construct a low-rank approximation of the original preference matrix based on the sampled columns.

4. **Estimation**: Use the low-rank approximation to estimate the user's preferences for all other products.

This is a simplified version of the process. The actual algorithm involves a lot of linear algebra, randomization, and complexity analysis. If you're interested in the full technical details, you can look up Ewan Tang's paper "A quantum-inspired classical algorithm for recommendation systems".

To summarize, the dequantization process involved taking a quantum algorithm and identifying the aspects of the quantum algorithm that could be implemented classically, thereby creating a classical version of the algorithm with similar efficiency.

**Quantum-inspired Algorithms**

* quantum computers still exhibit an exponential speedup over all known classical algorithms for **sparse, full-rank matrix problems**, including the quantum Fourier transform, eigenvector and eigenvalue analysis, linear systems, and others https://arxiv.org/abs/1905.10415
* Quantum algorithms for linear algebra are a flagship application of quantum computing, partic- ularly due to their relevance in machine learning. These algorithms typically [scale polylogarithmically](https://en.wikipedia.org/wiki/Polylogarithmic_function) with dimension, which, at the time they were reported, implied an asymptotic exponential speedup compared to state-of-the-art classical methods. For this reason, significant interest has been generated in the dequantization approach that led to breakthrough quantum- inspired classical algorithms for linear algebra problems with sublinear complexity https://arxiv.org/abs/1905.10415

* https://crypto.stackexchange.com/questions/48638/whats-the-difference-between-polylogarithmic-and-logarithmic

* For example, matrix chain ordering can be solved in polylogarithmic time on a Parallel Random Access Machine.

**Dequantization a quantum algorithm**
* Not all problems are amenable to quantum speed-up. Quantum algorithms are specifically efficient for problems where a superposition of solutions can be created and manipulated. Linear algebra problems, especially in high dimensions, may not easily lend themselves to such superposition-based speed-ups.
* **Output limitations**: Even if a quantum computer can perform a computation faster, it might not be able to provide the result in a way that's useful or can be checked in a reasonable time by a classical computer. This limitation is a crucial aspect in quantum linear algebra problems. For example, **you can load a vector into a quantum state exponentially faster, but you cannot read out all the entries in less than exponential time due to the 'no-cloning' theorem of quantum mechanics.** (limits ability to fully read out the state of a quantum system: each measurement only gives you a tiny piece of the picture, and you would have to repeat the process an exponential number of times to see the full state.)
* Tang’s algorithm builds on a classical data structure called a "tree of singular vectors," which is used to approximate the product of the preference matrix with a vector that represents a new user's preferences. This approximation is good enough to recommend items in a way that's nearly as good as the quantum algorithm.
* The approach relies on a form of algorithm called "randomized sampling algorithms" and works by creating a probability distribution that closely matches the one that would have been created by a quantum computer. By cleverly sampling from this distribution, Tang's algorithm can estimate the answers to the recommendation problem almost as accurately as the quantum version.
* Tang's algorithm relies heavily on the specifics of the recommendation system problem: the user-product preference matrix is "low rank." Therefore, it doesn't mean that all quantum algorithms can be dequantized in the same way.
* https://arxiv.org/abs/1905.10415 with https://www.math.cmu.edu/~af1p/Texfiles/SVD.pdf
* There are two ways to compute the average of many values. A first method is to add all of them together and divide the result by the total number of values. An alternative approach is to select a few values at random, then compute their average using the first method. Of course, in the second case we get only an approximation of the true average, but in many cases that is good enough and can be done in significantly less time.
* **This is the strategy of quantum-inspired algorithms: the coefficients of the solution vector are inner products between vectors, which can be expressed as the average of a collection of numbers that depend on the entries of the vectors**. Instead of directly computing this average (which takes linear time) we select a few of the values at random and compute their average instead.
* The question is, how many values do we need to sample to get a good approximation? It turns out that **the number of samples grows with the precision of the approximation, the rank, and the condition number of the input matrix.**
* In practice, this estimation method is only faster than a direct calculation of the coefficients if (i) precision, rank, and condition number are small, and (ii) the input matrix has a colossal dimension.
* In practice, it pays off to be slightly more sophisticated: we keep each number with likelihood proportional to its value, such that large numbers are kept with high probability and small numbers with low probability. This is the basis of a technique called [rejection sampling](https://en.m.wikipedia.org/wiki/Rejection_sampling), **which is employed in quantum-inspired algorithms to sample from the solution vectors**. Once the approximate singular value decomposition has been obtained and the coefficients have been estimated, it is possible to query any entry of the solution vector in sublinear time.
* https://medium.com/xanaduai/everything-you-always-wanted-to-know-about-quantum-inspired-algorithms-38ee1a0e30ef
* https://cstheory.stackexchange.com/questions/42338/list-of-quantum-inspired-algorithms
* https://github.com/XanaduAI/quantum-inspired-algorithms

*Tensor Networks*

https://iopscience.iop.org/article/10.1088/2058-9565/aaea94

https://pennylane.ai/qml/demos/tutorial_tn_circuits#huggins

**Tensor networks applied in quantum-inspired algorithms?** Solving Optimization Problems: Tensor networks can also be used to solve optimization problems. By mapping the problem onto a tensor network, we can use techniques like the Density Matrix Renormalization Group (DMRG) to find the optimal solution. This is particularly useful for problems where the number of variables is large, as is often the case in optimization problems.

**are Tensor Networks a form of quantum-inspired algorithm or not?**

Tensor networks are a mathematical tool widely used in many-body quantum physics and quantum information theory. They provide a way to efficiently represent and manipulate high-dimensional tensors (or high-dimensional arrays of numbers), which frequently occur in these fields.

Quantum many-body systems, which involve multiple interacting quantum particles, give rise to high-dimensional tensors. When these interactions are local, it turns out that the resulting state of the system can be represented more efficiently with a tensor network, allowing for computational tractability.

However, tensor networks are not inherently quantum-inspired algorithms. They are more of a framework or a technique for representing complex multi-dimensional data structures, which can be used in the development of algorithms for quantum and classical computations.

Where the quantum inspiration comes in is their application: tensor networks are extensively used to study quantum systems. For instance, they are used in algorithms for simulating quantum systems, understanding quantum entanglement, quantum error correction, and so on. They have also found applications beyond quantum physics, such as machine learning and artificial intelligence.

In summary, tensor networks themselves are not quantum-inspired algorithms, but they are a critical tool in the development and application of algorithms (including quantum-inspired ones) used in quantum physics and quantum computing.

* tensor networks are mathematical constructs that are used to represent and manipulate quantum states, particularly those with many parts or "qubits".

* In the context of quantum computing, each number in the tensor can represent an amplitude of a quantum state.

* A tensor network is a particular way of organizing and manipulating these tensors. In a tensor network, each tensor is represented as a node, and the indices of the tensor are represented as edges connecting the nodes. By organizing the tensors in this way, **certain computations can be performed more efficiently, particularly those involving high-dimensional tensors**.

* The real power of tensor networks comes from their ability to efficiently represent and manipulate certain types of quantum states, particularly those that exhibit some form of entanglement. **In many physical systems, and particularly in systems that are near their ground state, the quantum state of the system can be well approximated by a tensor network with a relatively simple structure**.

* This makes **tensor networks a very useful tool for simulating such systems on a classical computer**.

* For instance, the class of states that can be efficiently represented by a tensor network includes many states that arise in condensed matter physics and quantum chemistry, as well as some states that are used in quantum error correction and quantum computing.

* One common type of tensor network is the [Matrix Product State](https://en.m.wikipedia.org/wiki/Matrix_product_state) (MPS), which is used in the [Density Matrix Renormalization Group](https://en.m.wikipedia.org/wiki/Density_matrix_renormalization_group) (DMRG) method, a powerful method for simulating one-dimensional quantum systems.

  * The density matrix renormalization group (DMRG) is a numerical variational technique devised to obtain the [low-energy physics](https://en.m.wikipedia.org/wiki/Macroscopic_scale) of quantum many-body systems with high accuracy. As a variational method, DMRG is an efficient algorithm that attempts to find the lowest-energy matrix product state wavefunction of a Hamiltonian.

( Other types of tensor networks include the Projected Entangled Pair State (PEPS), the Multi-scale Entanglement Renormalization Ansatz (MERA), and others.

*Quantum Approximate Optimization Algorithm (QAOA)*

https://en.wikipedia.org/wiki/Quantum_optimization_algorithms

Farhi, Edward; Goldstone, Jeffrey; Gutmann, Sam (2014). "A Quantum Approximate Optimization Algorithm"

 arXiv:1411.4028: https://arxiv.org/abs/1411.4028

###### *Random circuit sampling*

[New Algorithm Closes Quantum Supremacy Window](https://www.quantamagazine.org/new-algorithm-closes-quantum-supremacy-window-20230109/)

*Random circuit sampling, a popular technique for showing the power of quantum computers, doesn’t scale up if errors go unchecked.*

If you imagine continually increasing the number of qubits as complexity theorists do, and you also want to account for errors, you need to decide whether you’re also going to keep adding more layers of gates — increasing the circuit depth, as researchers say. Suppose you keep the circuit depth constant at, say, a relatively shallow three layers, as you increase the number of qubits. You won’t get much entanglement, and the output will still be amenable to classical simulation. On the other hand, if you increase the circuit depth to keep up with the growing number of qubits, the cumulative effects of gate errors will wash out the entanglement, and the output will again become easy to simulate classically.

###### *Quantum Fourier Transform*

Momentum Space and Position Space with Quantum Fourier Transform

https://youtu.be/W8QZ-yxebFA

> **Quantum Fourier Transform is the change from one basis (computational) to another (Fourier basis)**

* Quantum Fourier Transform is the inverse Discrete Fourier Transform)


![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_047.png)

**General formula**

* Remember: <font color="blue">$e^{2\pi i}$ = 1</font> (identity operation), and see why $e^{\pi i}$ = -1 in [this video](https://youtu.be/-AyE1Wpgo3Q)


* In QFT we change the <font color="blue">$\theta$ = phase in $e^{2\pi i \theta}$</font> = Eigenvalue of Oracle function $U$ associated with an eigenvector |u⟩

* The phase $\theta$ is expressed as: <font color="blue">$\theta$ = $\frac{x_n}{2^{k_n}}$</font> with:

  * <font color="blue">$x_n$ = 0 or 1</font> state

  * <font color="blue">$k_n$</font> number of Qubits

* This is expressed in a so-called "controlled-R quantum gate" that **applies a relative phase change to |1>**

* The matrix form of this operator is: <font color="blue">$\hat{R}_{k}=\left(\begin{array}{cc}1 & 0 \\ 0 & e^{2 \pi i \frac{x_n}{ 2^{k_n}}}\end{array}\right)$</font>

*Quantum Fourier Transform with 3 Qubits: Introduction*

**Computational Basis States:** <font color="blue">$\tilde{x_1}$ = 0 or 1</font>, <font color="blue">$\tilde{x_2}$ = 0 or 1</font>, <font color="blue">$\tilde{x_3}$ = 0 or 1</font>. Number of Qubits: <font color="blue">$k_1$ = 1, $k_2$ = 2, $k_3$ = 3</font>

> <font color="blue">$\tilde{x_1}$ = $\frac{1}{\sqrt{2}}\left(|0\rangle+\mathrm{e}^{2 \pi \mathrm{i}\left(\frac{\mathrm{x}_{1}}{2^{k_1}}+\frac{x_{2}}{2^{k_2}}+\frac{x_{3}}{2^{k_3}}\right)}|1\rangle\right)$</font> = $\frac{1}{\sqrt{2}}\left(|0\rangle+\mathrm{e}^{2 \pi \mathrm{i}\left(\frac{\mathrm{x}_{1}}{2^1}+\frac{x_{2}}{2^2}+\frac{x_{3}}{2^3}\right)}|1\rangle\right)$  = $\frac{1}{\sqrt{2}}\left(|0\rangle+\mathrm{e}^{2 \pi \mathrm{i}\left(\frac{\mathrm{x}_{1}}{2}+\frac{x_{2}}{4}+\frac{x_{3}}{8}\right)}|1\rangle\right)$

* If only $\tilde{x_1}$ is activated, then it is a 180° Z-rotation of $\pi$ radians = -1

* If only $\tilde{x_2}$ is activated, then it is a 90° S-rotation of $\frac{\pi}{2}$ radians = i

* If only $\tilde{x_3}$ is activated, then it is a 45° T-rotation of $\frac{\pi}{4}$ radians = between 1 and i

> <font color="blue">$\tilde{x_2}$ = $\frac{1}{\sqrt{2}}\left(|0\rangle+\mathrm{e}^{2 \pi \mathrm{i}\left(\frac{x_2}{2^{k_1}}+\frac{x_3}{2^{k_2}}\right)}|1\rangle\right)$</font> = $\frac{1}{\sqrt{2}}\left(|0\rangle+\mathrm{e}^{2 \pi \mathrm{i}\left(\frac{x_2}{2^1}+\frac{x_3}{2^2}\right)}|1\rangle\right)$ = $\frac{1}{\sqrt{2}}\left(|0\rangle+\mathrm{e}^{2 \pi \mathrm{i}\left(\frac{x_2}{2}+\frac{x_3}{4}\right)}|1\rangle\right)$

* If only $\tilde{x_2}$ is activated, then it is a 180° Z-rotation of $\pi$ radians = -1

* If only $\tilde{x_3}$ is activated, then it is a 90° S-rotation of $\frac{\pi}{2}$ radians = i

* If both $\tilde{x_2}$ and $\tilde{x_3}$ are activated, then it is a 180° + 90° = 170° rotation of $\pi + \frac{\pi}{2}$ radians = -i

> <font color="blue">$\tilde{x_3}$ = $\frac{1}{\sqrt{2}}\left(|0\rangle+\mathrm{e}^{2 \pi \mathrm{i}\left(\frac{x_3}{2^{k_1}}\right)}|1\rangle\right)$</font> = $\frac{1}{\sqrt{2}}\left(|0\rangle+\mathrm{e}^{2 \pi \mathrm{i}\left(\frac{x_3}{2^1}\right)}|1\rangle\right)$ = $\frac{1}{\sqrt{2}}\left(|0\rangle+\mathrm{e}^{\pi \mathrm{i}x_3}|1\rangle\right)$

* If $\tilde{x_3}$ is activated, then it is a 180° Z-rotation of $\pi$ radians = -1


**Circuit Construction**

*Compare the equations above with the circuit activations below (how a circuits computes the results). For example for the first qubit the operator / gate $S$ = 90° rotation is only activated if the second qubit $x_2$ is in state 1. Here it is activated because $x_2$ = 1:*

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_0812.png)

*Here including the 8x8 matrix form for the complete operator:*

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_0801.png)

*Quantum Fourier Transform with 1 Qubit*

**Computational Basis States:** <font color="blue">$\tilde{x_1}$ = 0 or 1</font>. Number of Qubits: <font color="blue">$k_1$ = 1</font>


*Linear transformation of a qubit in the computational basis 0 and 1 each separately to the Fourier basis:*

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_0813.png)

**Computational Basis in $|0\rangle$**

> <font color="blue">For $x_1$ = 0 $\Rightarrow$</font> <font color="blue">$\tilde{x_1}$ = $\frac{1}{\sqrt{2}}\left(|0\rangle+\mathrm{e}^{2 \pi \mathrm{i}\left(\frac{x_1}{2^{k_1}}\right)}|1\rangle\right)$</font> = $\frac{1}{\sqrt{2}}\left(|0\rangle+\mathrm{e}^{2 \pi \mathrm{i}\left(\frac{x_1}{2^1}\right)}|1\rangle\right)$  $\Rightarrow$ $\mathrm{e}^{2 \pi \mathrm{i}\left(\frac{0}{2}\right)}$ = $\mathrm{e}^{2 \pi \mathrm{i} 0}$  = $\mathrm{e}^{0}$ = 1 (no rotation)

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_0810.png)

**Computational Basis in $|1\rangle$**

> <font color="blue">For $x_1$ = 1 $\Rightarrow$</font> <font color="blue">$\tilde{x_1}$ = $\frac{1}{\sqrt{2}}\left(|0\rangle+\mathrm{e}^{2 \pi \mathrm{i}\left(\frac{x_1}{2^{k_1}}\right)}|1\rangle\right)$</font> = $\frac{1}{\sqrt{2}}\left(|0\rangle+\mathrm{e}^{2 \pi \mathrm{i}\left(\frac{x_1}{2^1}\right)}|1\rangle\right)$ $\Rightarrow$ $\mathrm{e}^{2 \pi \mathrm{i}\left(\frac{1}{2}\right)}$ = $e^{\pi i 1} =$ <font color="blue">$-1$</font> (180° Z-rotation)

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_0811.png)

*Quantum Fourier Transform with 1 Qubit is a Hadamard transform!*

**One qubit QFT matrix**: $\frac{1}{\sqrt{2}}\left(\begin{array}{ll}1 & 1 \\ 1 & \mathrm{e}^{\pi i}\end{array}\right)$, where $\mathrm{e}^{\pi \mathrm{i}}$ = -1. So it is: <font color="blue"> QFT für x=1 = $\frac{1}{\sqrt{2}}\left(\begin{array}{ll}1 & 1 \\ 1 & -1\end{array}\right)$

**Compare with Hadamard transform matrix:**

In quantum computing, the Hadamard gate is a one-qubit rotation, mapping the qubitbasis states $|0\rangle$ and $|1\rangle$ to two **superposition** states with **equal weight of the computational basis** states $|0\rangle$ and $|1\rangle$. Usually the phases are chosen so that

>$
H=\frac{|0\rangle+|1\rangle}{\sqrt{2}}\langle 0|+\frac{|0\rangle-|1\rangle}{\sqrt{2}}\langle 1|
$

in Dirac notation. This corresponds to the transformation matrix

> <font color="blue">$
H_{1}=\frac{1}{\sqrt{2}}\left(\begin{array}{cc}
1 & 1 \\
1 & -1
\end{array}\right)
$

in the $|0\rangle,|1\rangle$ basis, also known as the computational basis. The states $\frac{|0\rangle+|1\rangle}{\sqrt{2}}$ and $\frac{|0\rangle-|1\rangle}{\sqrt{2}}$ are known as $|+\rangle$ and $|-\rangle$ respectively, and together constitute the polar basis in quantum computing.

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_073.jpg)


**Why Hadamard transform is exactly a 1 qubit Quantum Fourier Transform:** (see result of + for 0 state and - for 1 state) - Matrix-Vector-Multiplication (Single Qubit)

> <font color="blue">$H |0\rangle$</font> $ = \frac{1}{\sqrt{2}}\left(\begin{array}{cc}1 & 1 \\ 1 & -1\end{array}\right)\left[\begin{array}{l}1 \\ 0\end{array}\right] =\frac{1}{\sqrt{2}}\left[\begin{array}{c}1 \\ 1\end{array}\right]$ <font color="blue">$ \,\,= |+\rangle$ = $\frac{1}{\sqrt{2}}(|0\rangle+|1\rangle)$

> <font color="blue">$H |1\rangle$</font>$ = \frac{1}{\sqrt{2}}\left(\begin{array}{cc}1 & 1 \\ 1 & -1\end{array}\right)\left[\begin{array}{l}0 \\ 1\end{array}\right]=\frac{1}{\sqrt{2}}\left[\begin{array}{c}1 \\ -1\end{array}\right]$ <font color="blue">$ = |-\rangle$ = $\frac{1}{\sqrt{2}}(|0\rangle-|1\rangle)$

$|+\rangle=\frac{1}{\sqrt{2}}(|0\rangle+|1\rangle)$ weil <font color="gray">wegen $|0\rangle=\left[\begin{array}{l}1 \\ 0\end{array}\right]$ und $|1\rangle=\left[\begin{array}{l}0 \\ 1\end{array}\right]$ daher:</font> $\frac{1}{\sqrt{2}}\left[\begin{array}{ll}1 + 0 \\ 0 + 1\end{array}\right]=\frac{1}{\sqrt{2}}\left[\begin{array}{c}1 \\ 1\end{array}\right]$

$|-\rangle=\frac{1}{\sqrt{2}}(|0\rangle-|1\rangle)$ weil: $\frac{1}{\sqrt{2}}\left[\begin{array}{ll}1 - 0 \\ 0 - 1\end{array}\right]=\frac{1}{\sqrt{2}}\left[\begin{array}{c}1 \\ -1\end{array}\right]$

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_045.png)

2 im denominator verschwindet hier. 2^n für n=1 qubit. mit 2 oben und unten verschwinden beide.

*Quantum Fourier Transform with 3 Qubits for $|001\rangle$*

**Computational Basis in $|001\rangle$**

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_0804.png)

**Fourier Basis for $|001\rangle$**

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_0805.png)

**Computational States:** <font color="blue">$\tilde{x_1}$ = 0</font>, <font color="blue">$\tilde{x_2}$ = 0</font>, <font color="blue">$\tilde{x_3}$ = 1</font>. Number of Qubits: <font color="blue">$k_1$ = 1 qubit, $k_2$ = 2 qubits, $k_3$ = 3 qubits</font>

> <font color="blue">Qubit 1 = $\tilde{x_1}$</font> = $\frac{1}{\sqrt{2}}\left(|0\rangle+\mathrm{e}^{2 \pi \mathrm{i}\left(\frac{\mathrm{x}_{1}}{2^{k_1}}+\frac{x_{2}}{2^{k_2}}+\frac{x_{3}}{2^{k_3}}\right)}|1\rangle\right)$ = $\frac{1}{\sqrt{2}}\left(|0\rangle+\mathrm{e}^{2 \pi \mathrm{i}\left(\frac{0}{2}+\frac{0}{4}+\frac{1}{8}\right)}|1\rangle\right)$  = <font color="blue">$\frac{\pi i}{4}$</font> (45° T-rotation)

> <font color="blue">Qubit 2 = $\tilde{x_2}$</font> = $\frac{1}{\sqrt{2}}\left(|0\rangle+\mathrm{e}^{2 \pi \mathrm{i}\left(\frac{\mathrm{x}_{2}}{2^{k_1}}+\frac{x_{3}}{2^{k_2}}\right)}|1\rangle\right)$ = $\frac{1}{\sqrt{2}}\left(|0\rangle+\mathrm{e}^{2 \pi \mathrm{i}\left(\frac{0}{2}+\frac{1}{4}\right)}|1\rangle\right)$ = <font color="blue">$\frac{\pi i}{2}$</font> (90° S-rotation)

> <font color="blue">Qubit 3 = $\tilde{x_3}$</font> = $\frac{1}{\sqrt{2}}\left(|0\rangle+\mathrm{e}^{2 \pi \mathrm{i}\left(\frac{\mathrm{x}_{3}}{2^{k_1}}\right)}|1\rangle\right)$ = $\frac{1}{\sqrt{2}}\left(|0\rangle+\mathrm{e}^{2 \pi \mathrm{i} \frac{1}{2}}|1\rangle\right)$ = $e^{\pi i 1} =$ <font color="blue">$-1$</font> (180° Z-rotation)

**Circuit Construction**

*Compare the equations above with the circuit activations below (how a circuits computes the results). For example for the first qubit the operator / gate $S$ = 90° rotation is only activated if the second qubit $x_2$ is in state 1. Here it is not activated because $x_2$ = 0:*

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_0812.png)

*Quantum Fourier Transform with 3 Qubits for $|111\rangle$*

**Computational Basis in $|111\rangle$**

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_0802.png)

**Fourier Basis for $|111\rangle$**

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_0803.png)

**Computational States:** <font color="blue">$\tilde{x_1}$ = 1</font>, <font color="blue">$\tilde{x_2}$ = 1</font>, <font color="blue">$\tilde{x_3}$ = 1</font>. Number of Qubits: <font color="blue">$k_1$ = 1 qubit, $k_2$ = 2 qubits, $k_3$ = 3 qubits</font>

> <font color="blue">Qubit 1 = $\tilde{x_1}$</font> = $\frac{1}{\sqrt{2}}\left(|0\rangle+\mathrm{e}^{2 \pi \mathrm{i}\left(\frac{\mathrm{x}_{1}}{2^{k_1}}+\frac{x_{2}}{2^{k_2}}+\frac{x_{3}}{2^{k_3}}\right)}|1\rangle\right)$ = $\frac{1}{\sqrt{2}}\left(|0\rangle+\mathrm{e}^{2 \pi \mathrm{i}\left(\frac{1}{2}+\frac{1}{4}+\frac{1}{8}\right)}|1\rangle\right)$ = $\mathrm{e}^{2 \pi i 0.875} = \mathrm{e}^{\pi i 1.75}$ (180° Z-rotation + 90° S-rotation + 45° T-rotation)

> <font color="blue">Qubit 2 = $\tilde{x_2}$</font> = $\frac{1}{\sqrt{2}}\left(|0\rangle+\mathrm{e}^{2 \pi \mathrm{i}\left(\frac{\mathrm{x}_{2}}{2^{k_1}}+\frac{x_{3}}{2^{k_2}}\right)}|1\rangle\right)$ = $\frac{1}{\sqrt{2}}\left(|0\rangle+\mathrm{e}^{2 \pi \mathrm{i}\left(\frac{1}{2}+\frac{1}{4}\right)}|1\rangle\right)$ = $e^{\pi i 1.5} =$ <font color="blue">$-i$</font> (180° Z-rotation + 90° S-rotation)

> <font color="blue">Qubit 3 = $\tilde{x_3}$</font> = $\frac{1}{\sqrt{2}}\left(|0\rangle+\mathrm{e}^{2 \pi \mathrm{i}\left(\frac{\mathrm{x}_{3}}{2^{k_1}}\right)}|1\rangle\right)$ = $\frac{1}{\sqrt{2}}\left(|0\rangle+\mathrm{e}^{2 \pi \mathrm{i} \frac{1}{2}}|1\rangle\right)$ = $e^{\pi i 1} =$ <font color="blue">$-1$</font> (180° Z-rotation)

**Circuit Construction**

*Compare the equations above with the circuit activations below (how a circuits computes the results). For example for the first qubit the operator / gate $S$ = 90° rotation is only activated if the second qubit $x_2$ is in state 1. Here it is activated because $x_2$ = 1:*

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_0812.png)

*Cirq Code for Quantum Fourier Transform*

*Compare the code above with the circuit activations below (how a circuits computes the results):*

* $H$ gate = bring qubit in superposition.

  * *For $x=0$, no further rotation*

  * *For $x=1$, then appy additional *$Z$ gate = 180° rotation = $\pi$**

* *$S$ gate = 90° rotation = $\frac{\pi}{2}$*

* *$T$ gate = 45° rotation = $\frac{\pi}{4}$*

$C R_{j}=C Z^{1 / 2^{j-1}}$

* $Z$ entspricht $\pi$ (ein halber Kreis, zB von +1 zu -1 auf X-Achse)

* $S$ entspricht $\frac{\pi}{2}$, also wenn qubit 1 = 1, dann bei qubit 0 das $S$ transform anwenden (0,5)

  * S: The square root of Z gate, equivalent to cirq.Z ** 0.5

  * See: [Cirq Gates](https://quantumai.google/cirq/gates)

* $T$ entspricht $\frac{\pi}{4}$

In [None]:
!pip install cirq -q
import cirq

def make_qft(qubits):

    # Generate list of qubits
    qreg = list(qubits)

    # Make sure list is longer than 0 qubits:
    while len(qreg) > 0:

    # Remove first qubit from list and return its value (set as head-qubit):
        q_head = qreg.pop(0)

    # Apply Hadamard superposition to this head-qubit
        yield cirq.H(q_head)

    # Enumerate through list with i (index position) and corresponding qubit value (0 or 1)
        for i, qubit in enumerate(qreg):

    # Apply Controlled-Z * Theta-Phase-Shift on target ('q-head') if control-qubit ('qubit') is in state 1
            yield (cirq.CZ ** (1 / 2 ** (i + 1)))(qubit, q_head)

    # Do the inverse QFT as subroutine in quantum phase estimation
    #        yield (cirq.CZ ** (-1 / 2 ** (i + 1)))(qubit, q_head)

# Use inverse QFT as subroutine in quantum phase estimation
# phase_estimator.append(make_qft_inverse(qubits[::-1]))

    # Iterating through until "while len(qreg) = 0", then processes stops

"""Visually check the QFT circuit."""
qubits = cirq.LineQubit.range(17)
qft = cirq.Circuit(make_qft(qubits))
print(qft)

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.8/1.8 MB[0m [31m21.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m142.7/142.7 kB[0m [31m13.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m579.1/579.1 kB[0m [31m44.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m58.2/58.2 kB[0m [31m5.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m66.5/66.5 kB[0m [31m6.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m596.4/596.4 kB[0m [31m36.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m221.0/221.0 kB[0m [31m17.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m229.9/229.9 kB[0m [31m19.8 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_0815.png)

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_0812.png)

*Inverse Quantum Fourier Transform ('QFT Dagger' - Dagger is a complex conjugate operation!)*

Reminder of QFT:

* $QFT\,\,|x\rangle=|\tilde{x}\rangle=$ $\frac{1}{\sqrt{N}} \sum_{y=0}^{N-1} e^{\frac{2 \pi i}{N} x y} |y\rangle$

**Remember: Dagger is a complex conjugate operation!**

QFT inverse (see -2 turning i in -i which is a complex conjugate operation):

* $QFT^{\dagger}|\tilde{x}\rangle=|x\rangle=$ $\frac{1}{\sqrt{N}} \sum_{y=0}^{N-1} e^{\frac{-2 \pi i}{N} x y} |y\rangle$


The operator is then (
We have already seen that the Hadamard gate is self-inverse, and the same is clearly true for the SWAP gate; the inverse of the rotations gate $R_k$ is given by):

> The matrix form of inverse QFT operator is: <font color="blue">${R^{\dagger}}_{k}=\left(\begin{array}{cc}1 & 0 \\ 0 & e^{-2 \pi i / 2^{k}}\end{array}\right)$</font> and compare with QFT operator:  <font color="blue">$\hat{R}_{k}=\left(\begin{array}{cc}1 & 0 \\ 0 & e^{2 \pi i / 2^{k}}\end{array}\right)$

https://www.cl.cam.ac.uk/teaching/1920/QuantComp/Quantum_Computing_Lecture_9.pdf

###### *Exkurs: Quantum Phase Kickback*

**CNOT Gate and Phase Kickback**

<font color="blue">*CNOT-Gate applied to the computational basis 0 and 1*

* https://qiskit.org/textbook/ch-gates/phase-kickback.html

* Main article about Phase Kickback: https://towardsdatascience.com/quantum-phase-kickback-bb83d976a448

The CNOT-gate is a two-qubit gate. Thus, it transforms qubit states whose state we represent by a four-dimensional vector.

>$
|\psi\rangle=\alpha|0\rangle|0\rangle+\beta|0\rangle|1\rangle+\gamma|1\rangle|0\rangle+\delta|1\rangle|1\rangle=\left[\begin{array}{c}
\alpha \\
\beta \\
\gamma \\
\delta
\end{array}\right]
$

Remember Vector-Vector-Multiplikation (Kronecker / tensor product):

> $\mathbf{uv}$ = $\left[\begin{array}{c}u_{1} \\ u_{2}\end{array}\right]$ $\otimes$ $\left[\begin{array}{c}v_{1} \\ v_{2} \end{array}\right]$ = $\left[\begin{array}{l}u_{1}\left[\begin{array}{l}v_{1} \\ v_{2}\end{array}\right] \\ u_{2}\left[\begin{array}{l}v_{1} \\ v_{2}\end{array}\right]\end{array}\right]$=  $\left[\begin{array}{c}u_{1} v_{1} \\ u_{1} v_{2}\\ u_{2} v_{1} \\ u_{2} v_{2}\end{array}\right]$

> $\left[\begin{array}{l}1 \\ 0\end{array}\right]=|0\rangle, \quad\left[\begin{array}{l}0 \\ 1\end{array}\right]=|1\rangle$.

We choose two qubits in state $|0\rangle$:

> $|0\rangle \otimes|0\rangle = \left[\begin{array}{l}1 \\ 0\end{array}\right] \otimes\left[\begin{array}{l}1 \\ 0\end{array}\right]=$</font> $\left[\begin{array}{l}1\left[\begin{array}{l}1 \\ 0\end{array}\right] \\ 0\left[\begin{array}{l}1 \\ 0\end{array}\right]\end{array}\right]=$ $\left [\begin{array}{l}11 \\ 10 \\ 01 \\ 00\end{array}\right]$ = <font color="gray">$\left [\begin{array}{l}3 \\ 2 \\ 1 \\ 0\end{array}\right]$</font> = <font color="blue">$\left [\begin{array}{l}1 \\ 0 \\ 0 \\ 0\end{array}\right]$

Quits in two different states:

> $|0\rangle \otimes|1\rangle = \left[\begin{array}{l}1 \\ 0\end{array}\right] \otimes\left[\begin{array}{l}0 \\ 1\end{array}\right]=\left[\begin{array}{l}1\left[\begin{array}{l}0 \\ 1\end{array}\right] \\ 0\left[\begin{array}{l}0 \\ 1\end{array}\right]\end{array}\right]=\left[\begin{array}{l}0 \\ 1 \\ 0 \\ 0\end{array}\right]$

Accordingly, the CNOT-gate has a $4 \times 4$ transformation matrix.

>$
C N O T=\left[\begin{array}{llll}
1 & 0 & 0 & 0 \\
0 & 1 & 0 & 0 \\
0 & 0 & 0 & 1 \\
0 & 0 & 1 & 0
\end{array}\right]
$

There is no effect if the control qubit (at the left-hand position in the Dirac notation) is in state |0⟩, as in states |00⟩ and |01⟩.

> CNOT $\cdot|00\rangle=\left[\begin{array}{llll}1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & 1 & 0\end{array}\right] \cdot\left[\begin{array}{l}1 \\ 0 \\ 0 \\ 0\end{array}\right]=\left[\begin{array}{l}1 \\ 0 \\ 0 \\ 0\end{array}\right]=|00\rangle$

> CNOT $\cdot|01\rangle=\left[\begin{array}{llll}1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & 1 & 0\end{array}\right] \cdot\left[\begin{array}{l}0 \\ 1 \\ 0 \\ 0\end{array}\right]=\left[\begin{array}{l}0 \\ 1 \\ 0 \\ 0\end{array}\right]=|01\rangle$

<font color="blue">But if the control qubit is in state |1⟩, then the controlled (target) qubit switches from |0⟩ to |1⟩ and vice versa.</font>

> CNOT $\cdot|10\rangle=\left[\begin{array}{llll}1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & 1 & 0\end{array}\right] \cdot\left[\begin{array}{l}0 \\ 0 \\ 1 \\ 0\end{array}\right]=\left[\begin{array}{l}0 \\ 0 \\ 0 \\ 1\end{array}\right]=|11\rangle$

> CNOT $\cdot|11\rangle=\left[\begin{array}{llll}1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & 1 & 0\end{array}\right] \cdot\left[\begin{array}{l}0 \\ 0 \\ 0 \\ 1\end{array}\right]=\left[\begin{array}{l}0 \\ 0 \\ 1 \\ 0\end{array}\right]=|10\rangle$

When we describe the quantum states and operations in terms of mathematical formulae, we use the vectors |0⟩ and |1⟩ as a basis. |0⟩ and |1⟩ denote the standard or computational basis states. These states correspond to the possible measurements we might obtain when looking at the qubit. We measure a qubit in state |0⟩ as 0 with absolute certainty. And, we measure a qubit in state |1⟩ as 1, accordingly. While the basis {|0⟩,|1⟩} is convenient to work with mathematically, it is just a representation of the underlying physics.


<font color="blue">*CNOT-Gate applied to the superposition basis + and -*

The mathematical basis we chose leads to a specific representation of the CNOT-transformation. But this is not the only possible representation. In fact, there are infinitely many other possible choices. Our qubits are not limited to these two states. Qubits can be in a superposition of both states. For instance, there are the states that result from applying the Hadamard-gate on the basis states:

> $|+\rangle=\left[\begin{array}{c}\frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}}\end{array}\right]$ and $|-\rangle=\left[\begin{array}{c}\frac{1}{\sqrt{2}} \\ -\frac{1}{\sqrt{2}}\end{array}\right]$

Remember: Apply Hadamard gate on a qubit that is in the |0> state:

> $\frac{1}{\sqrt{2}}\left(\begin{array}{cc}1 & 1 \\ 1 & -1\end{array}\right)\left[\begin{array}{l}1 \\ 0\end{array}\right]=\frac{1}{\sqrt{2}}\left[\begin{array}{l}1 \\ 1\end{array}\right]$

Now apply Hadamard gate on a qubit that is in the |1> state:

> $\frac{1}{\sqrt{2}}\left(\begin{array}{cc}1 & 1 \\ 1 & -1\end{array}\right)\left[\begin{array}{l}0 \\ 1\end{array}\right]=\frac{1}{\sqrt{2}}\left[\begin{array}{l}1 \\ -1\end{array}\right]$

Mathematically, the following matrix represents the application of Hadamard gates on each of the two qubits.

> $H \otimes H=\frac{1}{\sqrt{2}}\left[\begin{array}{cc}H & H \\ H & -H\end{array}\right]=\frac{1}{2}\left[\begin{array}{cccc}1 & 1 & 1 & 1 \\ 1 & -1 & 1 & -1 \\ 1 & 1 & -1 & -1 \\ 1 & -1 & -1 & 1\end{array}\right]$

So, if we apply this matrix on two qubits in state |00⟩, they end up in state |++⟩.

> $\begin{aligned} H \otimes H(|00\rangle) &=\frac{1}{2}\left[\begin{array}{cccc}1 & 1 & 1 & 1 \\ 1 & -1 & 1 & -1 \\ 1 & 1 & -1 & -1 \\ 1 & -1 & -1 & 1\end{array}\right] \cdot\left[\begin{array}{l}1 \\ 0 \\ 0 \\ 0\end{array}\right]=\frac{1}{2}\left[\begin{array}{l}1 \\ 1 \\ 1 \\ 1\end{array}\right] \\ &=\frac{1}{2}(|0\rangle|0\rangle+|0\rangle|1\rangle+|1\rangle|0\rangle+|1\rangle|1\rangle) \\ &=\frac{1}{\sqrt{2}}(|0\rangle+|1\rangle) \otimes \frac{1}{\sqrt{2}}(|0\rangle+|1\rangle) \\ &=|++\rangle \end{aligned}$

The input state |01⟩ results in state |+−⟩.

> $\begin{aligned} H \otimes H(|01\rangle) &=\frac{1}{2}\left[\begin{array}{cccc}1 & 1 & 1 & 1 \\ 1 & -1 & 1 & -1 \\ 1 & 1 & -1 & -1 \\ 1 & -1 & -1 & 1\end{array}\right] \cdot\left[\begin{array}{l}0 \\ 1 \\ 0 \\ 0\end{array}\right]=\frac{1}{2}\left[\begin{array}{c}1 \\ -1 \\ 1 \\ -1\end{array}\right] \\ &=\frac{1}{2}(|0\rangle|0\rangle-|0\rangle|1\rangle+|1\rangle|0\rangle-|1\rangle|1\rangle) \\ &=\frac{1}{\sqrt{2}}(|0\rangle-|1\rangle) \otimes \frac{1}{\sqrt{2}}(|0\rangle+|1\rangle) \\ &=|+-\rangle \end{aligned}$

The input state |10⟩ results in state |−+⟩.

> $\begin{aligned} H \otimes H(|10\rangle) &=\frac{1}{2}\left[\begin{array}{cccc}1 & 1 & 1 & 1 \\ 1 & -1 & 1 & -1 \\ 1 & 1 & -1 & -1 \\ 1 & -1 & -1 & 1\end{array}\right] \cdot\left[\begin{array}{l}0 \\ 0 \\ 1 \\ 0\end{array}\right]=\frac{1}{2}\left[\begin{array}{c}1 \\ 1 \\ -1 \\ -1\end{array}\right] \\ &=\frac{1}{2}(|0\rangle|0\rangle+|0\rangle|1\rangle-|1\rangle|0\rangle-|1\rangle|1\rangle) \\ &=\frac{1}{\sqrt{2}}(|0\rangle+|1\rangle) \otimes \frac{1}{\sqrt{2}}(|0\rangle-|1\rangle) \\ &=|-+\rangle \end{aligned}$

Finally, if we apply this transformation on two qubits in state |11⟩, we put them into state |−−⟩.

> $\begin{aligned} H \otimes H(|11\rangle) &=\frac{1}{2}\left[\begin{array}{cccc}1 & 1 & 1 & 1 \\ 1 & -1 & 1 & -1 \\ 1 & 1 & -1 & -1 \\ 1 & -1 & -1 & 1\end{array}\right] \cdot\left[\begin{array}{l}0 \\ 0 \\ 0 \\ 1\end{array}\right]=\frac{1}{2}\left[\begin{array}{c}1 \\ -1 \\ -1 \\ 1\end{array}\right] \\ &=\frac{1}{2}(|0\rangle|0\rangle-|0\rangle|1\rangle-|1\rangle|0\rangle+|1\rangle|1\rangle) \\ &=\frac{1}{\sqrt{2}}(|0\rangle-|1\rangle) \otimes \frac{1}{\sqrt{2}}(|0\rangle-|1\rangle) \\ &=|--\rangle \end{aligned}$

Now, let’s apply the CNOT-gate on qubits in superposition. We can calculate the overall transformation matrix by multiplying the matrices of the CNOT-gate and the H⊗H transformation. The CNOT-gate switches the second and fourth columns of the H⊗H-matrix.

> $\operatorname{CNOT}(H \otimes H)=\left[\begin{array}{cccc}1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & 1 & 0\end{array}\right] \cdot \frac{1}{2}\left[\begin{array}{cccc}1 & 1 & 1 & 1 \\ 1 & -1 & 1 & -1 \\ 1 & 1 & -1 & -1 \\ 1 & -1 & -1 & 1\end{array}\right]=\frac{1}{2}\left[\begin{array}{cccc}1 & 1 & 1 & 1 \\ 1 & -1 & 1 & -1 \\ 1 & -1 & -1 & 1 \\ 1 & 1 & -1 & -1\end{array}\right]$

* And now, we apply this transformation to the four combinations of basis states.

<font color="blue">If the target qubit (at the right-hand side) is in state |1⟩, the state of the control qubit (at the left-hand side) flips from |+⟩ to |−⟩ and vice versa:

>

> $\begin{aligned} \operatorname{CNOT}(H \otimes H(|00\rangle)) &=\frac{1}{2}\left[\begin{array}{cccc}1 & 1 & 1 & 1 \\ 1 & -1 & 1 & -1 \\ 1 & -1 & -1 & 1 \\ 1 & 1 & -1 & -1\end{array}\right] \cdot\left[\begin{array}{l}1 \\ 0 \\ 0 \\ 0\end{array}\right]=\frac{1}{2}\left[\begin{array}{l}1 \\ 1 \\ 1 \\ 1\end{array}\right] \\ &=\frac{1}{2}(|0\rangle|0\rangle+|0\rangle|1\rangle+|1\rangle|0\rangle+|1\rangle|1\rangle) \\ &=\frac{1}{\sqrt{2}}(|0\rangle+|1\rangle) \otimes \frac{1}{\sqrt{2}}(|0\rangle+|1\rangle) \\ &=|++\rangle \end{aligned}$

> $\begin{aligned} \operatorname{CNOT}(H \otimes H(|01\rangle)) &=\frac{1}{2}\left[\begin{array}{cccc}1 & 1 & 1 & 1 \\ 1 & -1 & 1 & -1 \\ 1 & -1 & -1 & 1 \\ 1 & 1 & -1 & -1\end{array}\right] \cdot\left[\begin{array}{l}0 \\ 1 \\ 0 \\ 0\end{array}\right]=\frac{1}{2}\left[\begin{array}{c}1 \\ -1 \\ -1 \\ 1\end{array}\right] \\ &=\frac{1}{2}(|0\rangle|0\rangle-|0\rangle|1\rangle-|1\rangle|0\rangle+|1\rangle|1\rangle) \\ &=\frac{1}{\sqrt{2}}(|0\rangle-|1\rangle) \otimes \frac{1}{\sqrt{2}}(|0\rangle-|1\rangle) \\ &=|--\rangle \end{aligned}$

> $\begin{aligned} \operatorname{CNOT}(H \otimes H(|10\rangle)) &=\frac{1}{2}\left[\begin{array}{cccc}1 & 1 & 1 & 1 \\ 1 & -1 & 1 & -1 \\ 1 & -1 & -1 & 1 \\ 1 & 1 & -1 & -1\end{array}\right] \cdot\left[\begin{array}{l}0 \\ 0 \\ 1 \\ 0\end{array}\right]=\frac{1}{2}\left[\begin{array}{c}1 \\ 1 \\ -1 \\ -1\end{array}\right] \\ &=\frac{1}{2}(|0\rangle|0\rangle+|0\rangle|1\rangle-|1\rangle|0\rangle-|1\rangle|1\rangle) \\ &=\frac{1}{\sqrt{2}}(|0\rangle-|1\rangle) \otimes \frac{1}{\sqrt{2}}(|0\rangle+|1\rangle) \\ &=|-+\rangle \end{aligned}$

> $\begin{aligned} \operatorname{CNOT}(H \otimes H(|11\rangle)) &=\frac{1}{2}\left[\begin{array}{cccc}1 & 1 & 1 & 1 \\ 1 & -1 & 1 & -1 \\ 1 & -1 & -1 & 1 \\ 1 & 1 & -1 & -1\end{array}\right] \cdot\left[\begin{array}{l}0 \\ 0 \\ 0 \\ 1\end{array}\right]=\frac{1}{2}\left[\begin{array}{c}1 \\ -1 \\ 1 \\ -1\end{array}\right] \\ &=\frac{1}{2}(|0\rangle|0\rangle-|0\rangle|1\rangle+|1\rangle|0\rangle-|1\rangle|1\rangle) \\ &=\frac{1}{\sqrt{2}}(|0\rangle+|1\rangle) \otimes \frac{1}{\sqrt{2}}(|0\rangle-|1\rangle) \\ &=|+-\rangle \end{aligned}$

In short, we can say:

> $\operatorname{CNOT}(|++\rangle)=|++\rangle$

> $\operatorname{CNOT}(|+-\rangle)=|--\rangle$

> $\operatorname{CNOT}(|-+\rangle)=|-+\rangle$

> $\operatorname{CNOT}(|--\rangle)=|+-\rangle$

The two states |+⟩ and |−⟩ have the same measurement probabilities of |0⟩ and |1⟩. They result in either value with a probability of 0.5. **So, the CNOT-gate does not have any directly measurable implications**. <font color="blue">However, the control qubit switches its phase. It takes on the phase of the controlled (target) qubit.</font>

> For the phase of the target qubit is kicked up to the control qubit, we call this phenomenon phase kickback.

We learned the CNOT-gate is not a one-sided operation. It clearly has the potential to affect the state of the control qubit. Even though the phase is not directly measurable, there are ways to exploit differences in the phase between states. In fact, prominent algorithms, such as Grover’s search algorithm, exploit this effect.


###### *Quantum Phase Estimation*

<font color="red">**Phase Kickback: Control qubit changes:**
* from 0 to 1 or from 1 to 0 if target qubit was in state 1,
* from (+) to (-) or from (-) to (+) if the target was in (-)

**Quantum Phase Estimation**

* algorithm for determining the eigenvalues of a unitary operator

* the [quantum phase estimation algorithm](https://en.m.wikipedia.org/wiki/Quantum_phase_estimation_algorithm) (also referred to as quantum eigenvalue estimation algorithm), is a quantum algorithm to estimate the phase (or eigenvalue) of an eigenvector of a unitary operator.

* More precisely, given a unitary matrix $U$ and a quantum state $|\psi\rangle$ such that $U|\psi\rangle=e^{2 \pi i \theta}|\psi\rangle$, the algorithm estimates the value of $\theta$ with high probability within additive error $\varepsilon$, using $O(\log (1 / \varepsilon))$ qubits (without counting the ones used to encode the eigenvector state) and $O(1 / \varepsilon)$ controlled- $U$ operations.

* The algorithm was initially introduced by Alexei Kitaev in 1995.

* Phase estimation is frequently used as a subroutine in other quantum algorithms, such as Shor's algorithm and the quantum algorithm for linear systems of equations.

<font color="blue">*One Qubit Phase Estimation (with Hadamard Gate):*

![gg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_117.png)

<font color="blue">*Multi-Qubit Phase Estimation (with inverse Quantum Fourier Transform):*

![gg](https://upload.wikimedia.org/wikipedia/commons/a/a5/PhaseCircuit-crop.svg)

Remember in **Quantum Fourier Transform**:


> x1 = $\frac{1}{\sqrt{2}}\left(|0\rangle+\mathrm{e}^{2 \pi \mathrm{i}\left(\frac{\mathrm{x}_{1}}{2}+\frac{x_{2}}{4}+\frac{x_{3}}{8}\right)}|1\rangle\right)$

* <font color="blue">$e^{2\pi i}$ = 1 = identity</font>

* In Quantum Fourier Transform we change the phase <font color="blue">$\theta$ in $e^{2\pi i}$</font> <font color="red">$^{\theta}$</font>

  * <font color="red">= Eigenvalue of Oracle function $U$ associated with an eigenvector |u⟩</font>

* Phase <font color="blue">$\theta$ is $\frac{x_n}{2^{k}}$ with $x_n$ 0 or 1</font> state and $k$ number of Qubits.

* A controlled-R quantum gate applies a relative phase change to |1>. The matrix form of this operator is: <font color="blue">$\hat{R}_{k}=\left(\begin{array}{cc}1 & 0 \\ 0 & e^{2 \pi i / 2^{k}}\end{array}\right)$

**Now in Phase Estimation**:

> In $\frac{1}{\sqrt{2}}\left(|0\rangle+\mathrm{e}^{2 \pi \mathrm{i}\left(\frac{\mathrm{x}_{1}}{2}+\frac{x_{2}}{4}+\frac{x_{3}}{8}\right)}|1\rangle\right)$ dieser Teil ist die **Phase $\theta$** = $(\frac{\mathrm{x}_{1}}{2}+\frac{x_{2}}{4}+\frac{x_{3}}{8})$ mit dem Operator: $U^{2^n} = \phi$

Quantum phase estimation addresses the following problem:
* We have a $n$-qubit oracle function $U$, encoded in the form of a controlled- $U$ unitary.
* **$U$ has an eigenvalue $e^{2 \pi i \phi}$, associated with an eigenvector $|u\rangle$ which we can prepare.**
* <font color="red">**We wish to estimate the phase, $\phi$, of the eigenvalue to $t$ bits of precision.**

> <font color="blue">**Given a unitary operator $U$, the algorithm estimates $\theta$ in $U|\psi\rangle=e^{2 \pi i \theta}|\psi\rangle$** $\quad$ (based on Eigenvalue equation)</font>

* Here $|\psi\rangle$ is an eigenvector / eigenstate and $e^{2 \pi i \theta}$ is the corresponding eigenvalue.

* <font color="red">For example: the eigenvalues of X are −1 and 1 and have the eigenvectors |−⟩ and |+⟩ respectively.*</font>

*Since $U$ is unitary, all of its eigenvalues have a norm of 1.*

Reminder: QFT

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_084.png)

>**See below: <font color="red">Remember that a unitary matrix has eigenvalues of the form $e^{i \theta_{\psi}}$ (ohne $2 \pi$ wie oben bei QFT) and that it has eigenvectors that form an orthonormal basis**

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_083.png)

The problem: in both cases the probability is 0,5, just differs by the phase added: $=e^{\frac{i \pi}{2}}$

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_078.png)

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_079.png)

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_080.png)

**The probability of measuring 0 and 1 is each 0,5, but there is a small factor that makes them differ from 0,5, depending on the phase (angle):**

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_081.png)

> **In the different between the probability of measuring 0 or 1, you've encoded that phase! (In other words: you've taken that phase information and turned it into and amplitude that you can measure.**

* How to do this experimentally: you do a million shots of the experiment, collect statistics and check what the statistics say. How many times did I get zero? How many times did I get one? The hope is that the difference between the statistics of zero and one would allow us to back out theta

* Next level: now getting more precision with more qubits: (there is another circuit to prepare Psi yet, which is assumed to be given here)

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_082.png)

writing out the calculation:

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_085.png)

**Comparing QPE with QFT (QPE is the same as QFT with a different phase):**

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_086.png)

It's like applying a QFT of something (of a special phase $\frac{\theta_{\psi}}{2^{n}} 2 \pi$, the green box above!), and in order to get back to the original state you need to apply an inverse QFT at the end:

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_087.png)

*Step 1: Set up the unitary and number of bits to use in phase estimation*

<font color="blue">*Let's take as an example the T-gate, and use Quantum Phase Estimation to estimate its phase.*

You will remember that the $T$-gate adds a phase of $e^{\frac{i \pi}{4}}$ to the state $|1\rangle$ :

$
T|1\rangle=\left[\begin{array}{ll}
1 & 0 \\
0 & e^{\frac{i \pi}{4}}
\end{array}\right]\left[\begin{array}{l}
0 \\
1
\end{array}\right]=e^{\frac{i \pi}{4}}|1\rangle
$

Since QPE will give us $\theta$ where: $
T|1\rangle=e^{2 i \pi \theta}|1\rangle
$

<font color="red">We expect to find theta: $
\theta=\frac{1}{8}
$

We first perform a Hadamard gate on the first qubit to get the state

  * Original state of both qubits: $|0\rangle \otimes|\psi\rangle$

  * Hadamard on first qubit: $|+\rangle \otimes|\psi\rangle$ =

  * <font color="red">Distribute superposition: $|0\rangle|\psi\rangle+|1\rangle|\psi\rangle$</font>

  * <font color="blue">this part above is the rule from tensor products: If the state of the first particle is a superposition of two states, the state of the two-particle system is also a superposition: $\left(v_{1}+v_{2}\right) \otimes w=v_{1} \otimes w+v_{2} \otimes w$
</font>

    * The Hadamard states ∣+⟩ and ∣−⟩ are considered superposition states because they are a combination of the two computational states:

    * State: $|\pm\rangle=\frac{1}{\sqrt{2}}|0\rangle \pm \frac{1}{\sqrt{2}}|1\rangle$ so for + it is: $|\+\rangle=\frac{1}{\sqrt{2}}|0\rangle + \frac{1}{\sqrt{2}}|1\rangle$

  * we have intentionally omitted the normalization factor of 1/√2 for clarity

> $|+\rangle \otimes|\psi\rangle = \frac{1}{\sqrt{2}}\left[\begin{array}{l}1 \\ 1\end{array}\right] \otimes\left[\begin{array}{l}\psi\end{array}\right]= \frac{1}{\sqrt{2}}\left[\begin{array}{l}1 \, [\psi] \\ 1 \, [\psi]\end{array}\right]$

Remember: Apply Hadamard gate on a qubit that is in the |0> state:

> $|+\rangle$ = $\frac{1}{\sqrt{2}}\left(\begin{array}{cc}1 & 1 \\ 1 & -1\end{array}\right)\left[\begin{array}{l}1 \\ 0\end{array}\right]=\frac{1}{\sqrt{2}}\left[\begin{array}{l}1 \\ 1\end{array}\right]$ =  $\frac{1}{\sqrt{2}}|0\rangle + \frac{1}{\sqrt{2}}|1\rangle$

In [None]:
# Value of θ which appears in the definition of the unitary U above.
# Try different values.
theta = 0.125

# Define the unitary U-Gate:
U = cirq.Z ** (2 * theta)

# Accuracy of the estimate for theta. Try different values.
n_bits = 3

Here details about unitary U-Gate:

$U$ = $Z^{2^{n-n}}$

Z = $e^{\pi}$

* $Z$ entspricht $\pi$ (ein halber Kreis, zB von +1 zu -1 auf X-Achse)


then:

> <font color="blue">$U$ = $e^{\pi * 2^{n-n}}$


*Step 2: Build the first part of the circuit for phase estimation with controlled U-gate (Phase Kickback)*

We then perform a controlled U operation, which we have written as $U^{2^0}$. Here applies the **Phase Kickback!**

  * $|0\rangle|\psi\rangle+|1\rangle$ <font color="red">$U$</font> $|\psi\rangle$ =

  * $|0\rangle|\psi\rangle+$ <font color="red">$e^{2 \pi i 0. \phi_{1}}$</font> $|1\rangle|\psi\rangle$ =

  * $|0\rangle+$ <font color="red">$e^{2 \pi i 0. \phi_{1}}$</font> $|1\rangle) \otimes|\psi\rangle$

* Here are 2 things very important:

    * The second qubit register containing |ψ⟩ hasn’t changed. We shouldn’t expect it to, **since |ψ⟩ is an eigenstate of U (Remember: <font color="blue">**Given a unitary operator $U$, the algorithm estimates $\theta$ in $U|\psi\rangle=e^{2 \pi i \theta}|\psi\rangle$ based on the Eigenvalue equation**</font>). Thus, no matter how many times we apply U to this register, nothing happens to |ψ⟩**. But if we apply it more often it will 'amplify' the phase (Not in the sense of amplitude amplification) - we amplify it with adding more qubits and hence more $\phi$ to get more precision

    * what’s the point of applying U then? The effect was that **it wrote some information about the eigenvalue into the relative phase of the first qubit**. Namely, the entire effect was to
map: $|0\rangle+|1\rangle \mapsto|0\rangle+e^{2 \pi i 0. \phi_{1}}|1\rangle$

In [None]:
# Get qubits for the phase estimation circuit.
qubits = cirq.LineQubit.range(n_bits)
u_bit = cirq.NamedQubit('u')

# Build the first part of the phase estimation circuit.
phase_estimator = cirq.Circuit(cirq.H.on_each(*qubits))

# Set the input state of the eigenvalue register: Add gate to change initial state to |1>
phase_estimator.insert(0, cirq.X(u_bit))

# bit = cirq.LineQubit
for i, bit in enumerate(qubits):
    phase_estimator.append(cirq.ControlledGate(U).on(bit, u_bit) ** (2 ** (n_bits - i - 1)))
    # explanation: U-rot control aktiviert wenn entsprechendes qubit in state 1 (??)
    # dann aktiviere formel: U^2^(n-1) ...U^2^(n-2) ...U^2^(n-n)

print(phase_estimator)

> <font color="blue">$U$ = $Z^{2^{n-n}}$ = $e^{\pi * 2^{n-n}}$ fur das erste Gate: = $e^{\pi * (-0.128)}$ ????


*Why are we adding Pauli-X? The initial state for u_bit is the  state, but the phase for this state is trivial with the operator we chose. Inserting a Pauli  operator at the begining of the circuit changes this to the  state, which has the nontrivial  phase.*

*The controlled u gate*:

$|00\rangle \mapsto|00\rangle$

$|01\rangle \mapsto|01\rangle$

$|10\rangle \mapsto|1\rangle \otimes U|0\rangle=|1\rangle \otimes\left(u_{00}|0\rangle+u_{10}|1\rangle\right)$

$|11\rangle \mapsto|1\rangle \otimes U|1\rangle=|1\rangle \otimes\left(u_{01}|0\rangle+u_{11}|1\rangle\right)$

The matrix representing the controlled $U$ is

>$
\mathrm{C} U=\left[\begin{array}{cccc}
1 & 0 & 0 & 0 \\
0 & 1 & 0 & 0 \\
0 & 0 & u_{00} & u_{01} \\
0 & 0 & u_{10} & u_{11}
\end{array}\right]
$

**When U is one of the Pauli operators, X,Y, Z, the respective terms "controlled-X", "controlled-Y", or "controlled-Z" are sometimes used**.
Sometimes this is shortened to just CX, CY and CZ.

<font color="blue">*Why should we use more than one control Qubit?*

**Remember from Eigenvalue problem: Ax = λx in our case with the unitary operator: Ux = λx**

> $Ux =$ <font color="red">$e^{2πi*0.\varphi_{1} \varphi_{2} \cdots \varphi_{n} }$</font> $x$

> Beispiel: Wenn $0.\varphi_{1} \varphi_{2} \cdots \varphi_{n} = 0$, dann ist $e^{2πi*0}$ = λ = 1, so dass Ux = 1x. Damit ist λ = 1 ist der Eigenwert von f.

* Since |λ| = 1, we can write it without loss of generality as λ = $e^{2πiφ}$, where <font color="red">$e^{2πi}$ = 1 (= identity, if you insert 2*π*i into exponent at random, you will not change the result. Sometimes it can be a useful identity [Source](https://www.physicsforums.com/threads/e-2-pi-i-where-from.430393/), from Euler identity)</font> and **0 ≤ φ ≤ 1 is called the phase. This is what we want to estimate!**

* We saw that in QFT, φ being between 0 and 1 $\rightarrow$ $\mathrm{e}^{2 \pi \mathrm{i}\left(\frac{\mathrm{x}_{1}}{2}+\frac{x_{2}}{4}+\frac{x_{3}}{8}\right)}$

> **The term “estimation” comes about not from the fact that quantum computation is probabilistic, but rather in the degree of precision that we are going to compute, or estimate, the phase to.**

* The phase φ is going to be between zero and one, so we can write it as a decimal in binary notation as follows: $φ = 0.φ_1 φ_2 ···φ_n$, where each φi is either zero or one

  * The expression $\phi=0 . \phi_{1} \phi_{2} \cdots \phi_{n}$ is equivalent to $\phi=0 . \phi_{1} \phi_{2} \cdots \phi_{n} \Longleftrightarrow \phi=\sum_{k=1}^{n} \phi_{k} 2^{-k}$. Some numbers as binary decimals:

    * <font color="blue">The number 0.5 in decimal is 0.1 in binary, since 0.1 ≡ (1) · $2^{−1}$ = 1/2 = 0.5. So: $0.5_{10} = 0.1_2$. Note that 0.1 is the same as 0.100000....</font>

    * <font color="blue">The number 0.75 in decimal is 0.11 in binary, since 0.11 ≡ (1)·$2^{−1}$ +1·$2^{−2}$ = 1/2+1/4 = 3/4 = 0.75. To get this we need 2 Qubits. So we get more precision with more qubits</font>

    * 0.111 = 0.875

    * 0.1111 = 0.9375 in decimal, because: $0 \cdot 2^{0}+1 \cdot 2^{-1}+1 \cdot 2^{-2}+1 \cdot 2^{-3}+1 \cdot 2^{-4}=0 \cdot 1+1 \cdot 0.5+1 \cdot 0.25+1 \cdot 0.125+1 \cdot 0.0625=0+0.5+0.25+0.125+0.0625=0.937510$

  * Check also what is the value of the infinitely repeating binary decimal 0.1111111...

  * If it needed to be proved, the above exercise proves that 0 ≤ 0.φ1φ2 · · · ≤ 1


*Operator $U^{2^n}$ in QPE*

* $U^{2^0}$: 1 (decimal) = 00001

* $U^{2^1}$: 2 (decimal) = 00010

* $U^{2^2}$: 4 (decimal) = 00100

* $U^{2^3}$: 8 (decimal) = 01000

* $U^{2^4}$: 16 (decimal) = 10000


**So for falls die Phase 0.111 ist, wuerde bei 3 Qubits QPE berechnen:**

* $e^{2 \pi i 0. \varphi_{1} \varphi_{2} \varphi_{3}}$</font> = $e^{2 \pi i 0.(U^{2^0} + U^{2^1} + U^{2^2})}$  = <font color="red">$e^{2 \pi i 0.001 + 010 + 100)}$</font>  = $e^{2 \pi i 0.111}$

  * $2^0$ = 1 in decimal = 001 in binary

  * $2^1$ = 2 in decimal = 010 in binary

  * $2^2$ = 4 in decimal = 100 in binary

* in this case the phase $\theta$ = 0.111


**Compare that with Quantum Fourier Transform:**

* In $\frac{1}{\sqrt{2}}\left(|0\rangle+\mathrm{e}^{2 \pi \mathrm{i}\left(\frac{\mathrm{x}_{1}}{2}+\frac{x_{2}}{4}+\frac{x_{3}}{8}\right)}|1\rangle\right)$ dieser Teil ist die **Phase $\theta$** = $(\frac{\mathrm{x}_{1}}{2}+\frac{x_{2}}{4}+\frac{x_{3}}{8})$

* Let's say all $x_1, x_2$ and $x_3$ = 1 $\rightarrow$ $\mathrm{e}^{2 \pi \mathrm{i}\left(\frac{{1}}{2}+\frac{1}{4}+\frac{1}{8}\right)}$ = $\mathrm{e}^{2 \pi \mathrm{i}(0.5+0.25+0.125)}$ and in binary form: <font color="red">$\mathrm{e}^{2 \pi \mathrm{i}(0.100+0.010+0.001)}$</font>

* **We see that in QFT and QPE it's the same (both in red)!**

<font color="red">Jedes $U^{2^n}$ wird immer dann aktiviert, wenn im Control-Qubit oben eine 1 gemessen wird (siehe Bild hier unten):

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_118.png)

*Step 3: Perform the inverse QFT on the estimation qubits and measure them*


How can we read out this information from the quantum state? Consider the effect of applying another Hadamard transformation on the first qubit (without another H we will always measure 50/50 % a 0 or 1), which will produce (ignoring the normalization factor of 1/2):

  * $H(|0\rangle+$ <font color="red">$e^{2 \pi i 0 \cdot \phi_{1}}$</font> $|1\rangle)=$ $(1+$<font color="red">$e^{2 \pi i 0. \phi_{1}}$</font>$)|0\rangle$ + $(1-$<font color="red">$e^{2 \pi i 0 . \phi_{1}}$</font>$)|1\rangle$

  * this shares the phase with the first Qubit and allows us to read it out

  * Now, $\phi_{1}$ can only be zero or one. In the case that $\phi_{1}=0, e^{2 \pi i 0 . \phi_{1}}=1$, hence the state is exactly $|0\rangle$: $(\frac{1}{2}\left(1+e^{2 \pi i 0 . 0}\right)|0\rangle+\frac{1}{2}\left(1-e^{2 \pi i 0 . 0}\right)|1\rangle$ = $\frac{1}{2}\left(1+1\right)|0\rangle+\frac{1}{2}\left(1-1\right)|1\rangle$ = $|0\rangle$

  * these values in front of $|0\rangle$ and $|1\rangle$ are probabilities (here 0 has probability of being measured = 1, but small differences her reveal the phase and hence the Eigenvalue in other cases. See here:

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_081.png)

* **For 1 Qubit we can use a Hadamard Gate, and for more than 1 Qubit we use the inverse Fourier Transform**: on Quantum Phase Estimation $\frac{1}{2^{\frac{n}{2}}} \sum_{k=0}^{2^{n}-1} e^{2 \pi i \theta k}$ then the inverse Quantum Fourier transform:  <font color="red">$ \frac{1}{2^{\frac{n}{2}}} \sum_{x=0}^{2^{n}-1} e^{\frac{-2 \pi i k x}{2^{n}}}|x\rangle$</font> so that: $\frac{1}{2^{\frac{n}{2}}} \sum_{k=0}^{2^{n}-1} e^{2 \pi i \theta k}$ <font color="red">$ \frac{1}{2^{\frac{n}{2}}} \sum_{x=0}^{2^{n}-1} e^{\frac{-2 \pi i k x}{2^{n}}}|x\rangle$</font>

  * inverse QFT for 1 Qubit is: $ \frac{1}{2^{\frac{1}{2}}} \sum_{x=0}^{2^{1}-1} e^{\frac{-2 \pi i k x}{2^{1}}}|x\rangle$ = $\frac{1}{\sqrt{2}} e^{-1 \pi i k x}$ fur $k$ = $\varphi$ = 0 and $x$ = 0. --> somehting is not right here yet!

Thus, we measure with certainty (i.e., not probabilistically) a state that tells us exactly what the phase,
and hence the eigenvalue, is.

In [None]:
def make_qft_inverse(qubits):
    """Generator for the inverse QFT on a list of qubits."""
    qreg = list(qubits)[::-1]
    while len(qreg) > 0:
        q_head = qreg.pop(0)
        yield cirq.H(q_head)
        for i, qubit in enumerate(qreg):
            yield (cirq.CZ ** (-1 / 2 ** (i + 1)))(qubit, q_head)

# Do the inverse QFT
phase_estimator.append(make_qft_inverse(qubits[::-1]))

# Add measurements to the end of the circuit
phase_estimator.append(cirq.measure(*qubits, key='m'))
print(phase_estimator)

In [None]:
# Syntax explanation for list(qubits)[::-1]: list[<start>:<stop>:<step>]
# So, when you do a[::-1], it starts from the end towards the first taking each element.
# So it reverses a. This is applicable for lists/tuples as well.
# Example: >>> a = '1234' >>> a[::-1] will get you: '4321'

*Step 4: Simulate the circuit and convert from measured bit values to estimated θ values*

In [None]:
# Simulate the circuit.
sim = cirq.Simulator()
result = sim.run(phase_estimator, repetitions=10)

# Convert from output bitstrings to estimate θ values.
theta_estimates = np.sum(2 ** np.arange(n_bits) * result.measurements['m'], axis=1) / 2**n_bits
print(theta_estimates)

In [None]:
"""Plot the results."""
plt.style.use("seaborn-whitegrid")

plt.plot(theta_estimates, "--o", label="Phase estimation")
plt.axhline(theta, label="True value", color="black")

plt.legend()
plt.xlabel("Number of trials")
plt.ylabel(r"$\theta$");

In [None]:
def phase_estimation(theta, n_bits, n_reps=10, prepare_eigenstate_gate=cirq.X):
    # Define qubit registers.
    qubits = cirq.LineQubit.range(n_bits)
    u_bit = cirq.NamedQubit('u')

    # Define the unitary U.
    U = cirq.Z ** (2 * theta)

    # Start with Hadamards on every qubit.
    phase_estimator = cirq.Circuit(cirq.H.on_each(*qubits))

    # Do the controlled powers of the unitary U.
    for i, bit in enumerate(qubits):
        phase_estimator.append(cirq.ControlledGate(U).on(bit, u_bit) ** (2 ** (n_bits - 1 - i)))

    # Do the inverse QFT.
    phase_estimator.append(make_qft_inverse(qubits[::-1]))

    # Add measurements.
    phase_estimator.append(cirq.measure(*qubits, key='m'))

    # Gate to choose initial state for the u_bit. Placing X here chooses the |1> state.
    phase_estimator.insert(0, prepare_eigenstate_gate.on(u_bit))

    # Code to simulate measurements
    sim = cirq.Simulator()
    result = sim.run(phase_estimator, repetitions=n_reps)

    # Convert measurements into estimates of theta
    theta_estimates = np.sum(2**np.arange(n_bits)*result.measurements['m'], axis=1)/2**n_bits

    return theta_estimates

In [None]:
"""Analyze convergence vs n_bits."""
# Set the value of theta. Try different values.
theta = 0.123456

max_nvals = 16
nvals = np.arange(1, max_nvals, step=1)

# Get the estimates at each value of n.
estimates = []
for n in nvals:
    estimate = phase_estimation(theta=theta, n_bits=n, n_reps=1)[0]
    estimates.append(estimate)

print(theta_estimates)
print(estimates)

"""Plot the results."""
plt.style.use("seaborn-whitegrid")

plt.plot(nvals, estimates, "--o", label="Phase estimation")
plt.axhline(theta, label="True value", color="black")

plt.legend()
plt.xlabel("Number of bits")
plt.ylabel(r"$\theta$");

*Step 5: Compute the Eigenvalues from the theta value*

Eigenvalue: $e^{2 \pi i \theta}$ is the corresponding eigenvalue, so for $\theta$ = 0.125 $\rightarrow$ $e^{2 * \pi * i * 0.125}$ = <font color="blue">0.707106781 + 0.707106781 i (Eigenvalue of T-gate)</font>

* Verification: $
T|1\rangle=\left[\begin{array}{ll}
1 & 0 \\
0 & e^{\frac{i \pi}{4}}
\end{array}\right]\left[\begin{array}{l}
0 \\
1
\end{array}\right]=e^{\frac{i \pi}{4}}|1\rangle
$ $\rightarrow$ <font color="blue">$e^{\frac{i \pi}{4}}$ is the same as $e^{2 * \pi * i * 0.125}$ bzw. $e^{\frac{2 * \pi * i}{8}}$</font>

* $e^{2 \pi i 0. \varphi_{1} \varphi_{2} \varphi_{3}}$</font> = $e^{2 \pi i 0.(U^{2^0} + U^{2^1} + U^{2^2})}$  = <font color="red">$e^{2 \pi i 0.001 + 010 + 100)}$</font>  = $e^{2 \pi i 0.111}$

* ps: $e^{2 \pi i}$ =  1 (identity) - ohne theta, die phase

* From Eigenvalue euqation: $Ux =$ <font color="red">$e^{2πi*0.\varphi_{1} \varphi_{2} \cdots \varphi_{n} }$</font> $x$. Beispiel: Wenn $0.\varphi_{1} \varphi_{2} \cdots \varphi_{n} = 0$, dann ist $e^{2πi*0}$ = λ = 1, so dass Ux = 1x. Damit ist λ = 1 ist der Eigenwert von f.

* https://quantumcomputing.stackexchange.com/questions/9577/how-to-find-eigenvalues-and-eigenvector-for-a-quantum-gate


![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_115.png)


###### *Shor's Algorithm*

Shor’s algorithm for factoring, which is not NP-complete, has been run on real quantum computers. The largest number it has been used to factor (that I could find a reference to) is 56153: New largest number factored on a quantum device is 56,153. And that was on 4 qubits (https://phys.org/news/2014-11-largest-factored-quantum-device.html).


(2019) How to factor 2048 bit RSA integers in 8 hours using 20 million noisy qubits

https://arxiv.org/abs/1905.09749

Yes, Craig Gidney has made significant contributions to the study of the discrete logarithm problem. In particular, he has developed quantum algorithms for solving the DLP that are significantly faster than classical algorithms.

In 2010, Gidney published a paper in which he showed that the DLP could be solved using a quantum algorithm that is exponentially faster than the best classical algorithm. This algorithm is based on the use of modular forms, which are a type of mathematical function that has certain properties that make them well-suited for quantum computation.

Gidney's work on the DLP has had a major impact on the field of cryptography. It has shown that the DLP is not as secure as previously thought, and it has led to the development of new cryptographic systems that are more resistant to quantum attacks.

In addition to his work on the DLP, Gidney has also made significant contributions to other areas of quantum computing, such as quantum error correction and quantum simulation. He is a leading expert in the field of quantum computing, and his work is helping to pave the way for the development of practical quantum computers.

Here are some of the specific contributions that Craig Gidney has made to the study of the discrete logarithm problem:

* In 2010, he showed that the DLP could be solved using a quantum algorithm that is exponentially faster than the best classical algorithm.
* In 2012, he developed a quantum algorithm for solving the DLP in the presence of noise.
* In 2019, he co-authored a paper that showed how to use the DLP to factor RSA integers.

Gidney's work on the DLP has had a major impact on the field of cryptography. It has shown that the DLP is not as secure as previously thought, and it has led to the development of new cryptographic systems that are more resistant to quantum attacks.

I hope this helps! Let me know if you have other questions.

https://quantumai.google/cirq/experiments/shor

Der Shor-Algorithmus lässt sich am besten für die Primfaktorzerlegung erklären. Damit Quantencomputer diese Aufgabe meistern können, muss man das Problem allerdings etwas umformulieren. Denn der Quantenalgorithmus stützt sich auf eine Anleitung zum Faktorisieren von Zahlen, die aus den 1970er Jahren stammt. Damals fand man heraus, dass nur vier Schritte nötig sind, um die Primfaktoren p und q einer Zahl N = p·q zu berechnen.

1. Wähle eine zufällige Zahl a < N.

2. Finde die Periodenlänge r von a Modulo N.

3. Stelle sicher, dass r eine gerade Zahl ist und dass (a^(r/2) + 1) nicht durch N teilbar ist.

4. Dann ist p der größte gemeinsame Teiler von (a^(r/2) − 1) und N. Der andere Primteiler q ist entsprechend der größte gemeinsame Teiler von (a^(r/2) + 1) und N.

https://www.spektrum.de/kolumne/shor-algorithmus-wie-quantencomputer-verschluesselungen-knacken/2133048


**Classical Calculation**

* <font color="blue">Factoring is equivalent to finding a nontrivial squareroot of 1 mod N.

* all we need to do is find this nontrivial squareroot of unity, and we can factor whatever number we need. As promised, we can do this with period finding, specifically by computing the order of a random integer

* The order of some integer x modulo N is the smallest integer r such that $x^r$ = 1 mod N

*Modular Arithmetic*

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_088.jpg)

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_089.jpg)

**<font color="blue">Step 1: Pick coprime of N**

*Drei mögliche Verfahren zur Berechnung des ggT :*

Erstes Verfahren: Euklidischer Algorithmus
* 15	:	13	  = 	1	  Rest  	2.	  Also ist ggT (15,13)= ggT (13,2)
* 13	:	2	  = 	6	  Rest  	1.	  Also ist ggT (13,2)= ggT (2,1)
* 2	:	1	  = 	2	  Rest  	0.	  Also ist ggT (2,1)= ggT (1,0)
* Ergebnis: Der ggT von 15 und 13 ist 1.

Zweites Verfahren: Vergleichen der Teilermengen .
* Die Teilermenge von 15 lautet: {1,3,5,15}.
* Die Teilermenge von 13 lautet: {1,13}.
* Die größte in beiden Teilermengen vorkommende Zahl ist 1. Also ist 1 der ggT von 15 und 13.

Dritte Möglichkeit: Vergleichen der Primfaktorzerlegung
* Die Primfaktorzerlegung von 15 lautet: 15= 3·5.
* Die Primfaktorzerlegung von 13 lautet: 13= 13.
* Die gemeinsamen Primfaktoren sind: 1.
* Also ist 1 der ggT.

*Modulo (kurz: mod) berechnet den Rest einer Division zweier Zahlen. In Mathematischen Formeln wird modulo mit mod abgekürzt, beispielsweise: 23 mod 8 = 7. Bei dieser Rechnung kommt 7 heraus, weil die 8 zweimal in die 23 passt und dann 7 übrig bleiben.*

In [None]:
# Product of two prime numbers (to check later if result is correct)
N=5 * 3

# Pick coprime (!) number to N to factorize N into primes
a=13

# Code Example to understand periodicity in the context of factoring prime numbers:

import math
# Compute greated common divisor between a and N
math.gcd(a, N)

**<font color="blue">Step 2: Find the period of $a^r$ $\equiv$ 1 $(modN)$:**

* <font color="blue">the order of x is just the period of the function f(i) = $x^i$ mod N.

* <font color="blue">In quantum computing you use QFT in order to determine the period !!

In [None]:
import matplotlib. pyplot as plotter
sns.set(rc={'figure.figsize':(12, 5), "lines.linewidth": 1.5})

r = list(range(N))
y= [a**r0 % N for r0 in r]

plotter.plot (r, y)
plotter.xlabel('r')
plotter.ylabel('Rest:' f'{a}^r (mod{N})')
plotter.title('Periode der Restwerte (aus den Multiples von r)')
plotter.show()

<font color="red">**Beispiel: Choose any number $a$ and takes its multiple $r$ so many times, until the rest in modulo is 1, (except r=0)**</font>

> $13^0$ (mod 15) = 1 (mod 15) = 1

> $13^1$ (mod 15) = 13

> $13^2$ (mod 15) = 169 (mod 15) = 4

* <font color="blue">*Erlauterung: Nimm 15 * 11 = 165, bis zur 169 verbleibt ein Rest 4*

> $13^3$ (mod 15) = 2197 (mod 15) = 7

* <font color="blue">*Erlauterung: Nimm 15 * 146 = 2190, bis zur 2197 verbleibt ein Rest 7*

> $13^4$ (mod 15) = 28561 (mod 15) = 1 (<font color="blue"><u>hier started die Periode wieder, that's the r we are looking for!</u>)

> usw..

In [None]:
r= r[y[1:].index(1)+1]
print(f'r = {r}')

**<font color="blue">Step 3: Bestimme $x \equiv a^{\frac{r}{2}}(\operatorname{mod} N)$**. Mindestens einer der beiden Primfaktoren von N={p,q} is beinhalted in gcd(x+1, N) bzw. gcd(x-1, N)

*In this case with a=13, N=15 and r=4:*

* $x \equiv a^{\frac{r}{2}}(\operatorname{mod} N)$

* $x \equiv 13^{\frac{4}{2}}(\operatorname{mod} 15)$

* x = 169 (mod 15) = 4

  * gcd(x-1, N) = 3 = p

  * gcd(x+1, N) = 5 = q

Achtung: in einem anderen Beispiel: N=11*7 (Primzahlen), a=18, ergibt x=43.

* Davon x-1=42 und x+1=44.
* Das sind naturlich keine Primzahlen,
* Aber deren Faktoren sind: 44 = 2 * 2 * 11 und 42 = 2 * 3 * 7
* das heisst, x-1 und x+1 kann auch die Primzahlen indirekt enthalten!

In [None]:
if r % 2 == 0:
  x = (a**(r/2.)) % N
  print(f'x = {x}')
  if ((x + 1) % N) != 0:
    print(math.gcd((int(x)+1), N), math.gcd((int(x)-1), N))
  else:
      print ("x + 1 is 0 (mod N)")
else:
  print (f'r = {r} is odd')

**Shor's Algorithm**

* When finding order using the period finding algorithm, it is important to use enough qubits. A sensible rule is that you need to use m qubits so that $2^m$ >> $N^2$, where N is the number we are trying to factor, because the order of a random number might be as large as N

* Example: Lets factor N=119. Suppose we pick the number 16 to start with. Wie viele Qubits m sollten wir mindestens nehmen? $N^2$ = $119^2$ =14.161 und $2^m$ muss deutlich grosser sein, also mindestens = $2^{14}$ = 16.384. Wir brauchen also mindestens 14 Qubits, um 119 zu faktorisieren.

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_090.jpg)

* Because we know that the order of x will be even and $x^{s/2}$ will be a nontrivial square root with probability at least 1/2, we can be confident that we will be able to factor N in just a few runs of the algorithm. Because the time it takes to find the period grows as a polynomial in the number of bits, and the number of bits grows like 2logN(by the above requirement), we expect the time it takes to factor N to grow as a polynomial in logN.

* Here is the circuit for Shor’s Algorithm. It relies heavily on period finding, and so the circuit looks a lot like the circuit for period finding. The key difference is that we are finding the period of f(i) = xi, and the number of bits we need to input is very large.

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_091.jpg)

**How does it work in the quantum circuit?**

That's the function in $U$: given an $x$, the $U$ will compute:

> $f_{a, N}(x) \equiv a^{x}(\bmod N)$

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_092.png)

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_093.png)

**<font color="blue">Shor's Algorithm: Step by Step**

**Beispiel: a=13 und N=15, was macht Shor's Algorithm genau im Circuit an der Stelle $U_{f_{(a,N)}}$ und $QFT^{\dagger}$?**

ps: a muss ein Coprime von N sein. Wenn es kein Coprime ist, muessen wir nicht durch Shor's Algorithm gehen, weil a dann einen Faktor mit N teilt :) Aber es ist very unlikely to find a coprime of a large number N.

**First let's divide it into steps. 1-5:**

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_095.png)

**Step 1**: Get Qubits in state 0 and apply Hadamard Superposition

We start with 4 Qubits all in zeros, mit den Registers x und w, und jedes 4 Mal Tensorproduct multipliziert, weil wir 4 Qubits haben:

> $|0\rangle_{x}^{\otimes 4}$ $|0\rangle_{w}^{\otimes 4}$

All Hamadard Gates are applied to top 4 Qubits (x register), and right part (w register) gets nothing applied to it:

> $[H^{\otimes 4}|0\rangle] \,\, |0\rangle^{\otimes^{4}}$

> = $\frac{1}{4}[|0\rangle+|1\rangle+|2\rangle+\cdots+|15\rangle]$ $|0\rangle$

* Reminder 1: Multiplikation mit $\frac{1}{4}$, weil 4 Qubits in Hadamard-Superposition

* Reminder 2: this is the 4 bit representation of the decimal number, so for example 15 in binary = 1111. Daher kann man auch die 4 angeben als Erinnerung der Bit representation:

> = $\frac{1}{4}[|0\rangle_4+|1\rangle_4+|2\rangle_4+\cdots+|15\rangle_4]$ $|0\rangle_4$


**Step 2**: Compute $U$ with $f_{a, N}(x) \equiv a^{x}(\bmod N)$ - Was passiert genau in der Box mit $U_{f_{(a,N)}}$?

**Given an $x$, the $U$ will compute: <font color="red">$f_{a, N}(x) \equiv a^{x}(\bmod N)$</font>**

Schauen wir nochmal im vorherigen Schritt und markieren eine Komponente:

> = $\frac{1}{4}[$ <font color="red">$|0\rangle_4$</font> $+|1\rangle_4+|2\rangle_4+\cdots+|15\rangle_4]$ $\,$ $|0\rangle_4$

<font color="red">$U_{f_{(a,N)}}$</font> macht dann folgendes:

> = $\frac{1}{4}$ <font color="red">[$|0\rangle_{4}\, \left|  0 \bigoplus 13^{0}(\bmod 15)\right\rangle_{4}$</font> + $|1\rangle_{4}\left|0 \bigoplus 13^{1}(\bmod 15)\right\rangle_{4}$ + $|2\rangle_{4}\left|0 \bigoplus 13^{2}(\bmod 15)\right\rangle_{4}$ + $|3\rangle_{4}\left|0 \bigoplus 13^{3}(\bmod 15)\right\rangle_{4}$ etc..]

Remember: $\bigoplus$ means "addition modular 2" bzw. "XOR". Anything XORs with 0, is thing itself: 0 $\bigoplus$ Z = Z. damit ergibt sich folgende Rechnung:

> = $\frac{1}{4}$ <font color="red">[$|0\rangle_{4}\, \left|   13^{0}(\bmod 15)\right\rangle_{4}$</font> + $|1\rangle_{4}\left| 13^{1}(\bmod 15)\right\rangle_{4}$ + $|2\rangle_{4}\left| 13^{2}(\bmod 15)\right\rangle_{4}$ + $|3\rangle_{4}\left| 13^{3}(\bmod 15)\right\rangle_{4}$ etc..]


Aus der Modulo-Rechnung ergeben sich die Restwerte:

* <font color="red">$13^{0}(\bmod 15)$ = 1</font>

* $13^{1}(\bmod 15)$ = 13

* $13^{2}(\bmod 15)$ = 4

* $13^{3}(\bmod 15)$ = 7

* $13^{4}(\bmod 15)$ = 1

* usw..

Since it's periodic, it will repeat, with the x and w register:

> = $\frac{1}{4}$ <font color="red">[$|0\rangle_{4}\,\left|1\right\rangle_{4}$</font> + $|1\rangle_{4}\left|13\right\rangle_{4}$ + $|2\rangle_{4}\left|4\right\rangle_{4}$ + $|3\rangle_{4}\left|7\right\rangle_{4}$ etc..]

Hier nochmal untereinander mit denselben Restwerten zur besseren Visualisierung:

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_094.png)

**Step 3: Measurement of the w register / bottom 4 Qubits**

* the outputs of the w-register measurements are either 1, 13, 4 or 7 (die Restwerte) with equal probability

* let's say we measure 7, what happens to x? X becomes either 3, 7, 11 or 15 (the value in front of the qubit with 7!) with equal probability:

  * after $|\omega\rangle$ = $|7\rangle_4$ , $|x\rangle$ becomes:

  * <font color="blue">$|x\rangle$ $|\omega\rangle$ = $\frac{1}{2}\left[|3\rangle_{4}+|7\rangle_{4}+|11\rangle_{4}+ |15 \rangle_{4}\right]$ $\otimes |7\rangle_4$

  * Normalization has changed: before we had 16 combinations mit 1/4, here we have only 4 combinations with 1/2 (=one over square root of 4)

* **For the next step 4, the Restwert doesn't matter anymore, here: $\otimes |7\rangle_4$. We can ignore it. Because it step 4 we apply the measured $|x\rangle$ in the $QFT^{\dagger}$, and don't care about $|\omega\rangle$ anymore**. And $|x\rangle$ is in this case: $\frac{1}{2}\left[|3\rangle_{4}+|7\rangle_{4}+|11\rangle_{4}+ |15 \rangle_{4}\right]$

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_095.png)

*Exkurs: Eine komplexe Zahl $z=a+b i$ und die zu ihr konjugiert komplexe Zahl $\bar{z}=a-b i$*:

![gg](https://upload.wikimedia.org/wikipedia/commons/thumb/f/f3/Komplexe_konjugation.svg/294px-Komplexe_konjugation.svg.png)

Ändert man das Vorzeichen des Imaginärteils $b$ einer komplexen Zahl

> $z=a+b \mathrm{i}$

so erhält man die zu $z$ konjugiert komplexe Zahl

> $\bar{z}=a-b \mathrm{i}$

(manchmal auch $z^{*}$ geschrieben).

Die Konjugation $\mathbb{C} \rightarrow \mathbb{C}, z \mapsto \bar{z}$ ist ein (involutorischer) Körperautomorphismus, da sie mit Addition und Multiplikation verträglich ist, d. h., für alle $y, z \in \mathbb{C}$ gilt

>$
\overline{y+z}=\bar{y}+\bar{z}, \quad \overline{y \cdot z}=\bar{y} \cdot \bar{z}
$

In der Polardarstellung hat die konjugiert komplexe Zahl $\bar{z}$ bei unverändertem Betrag gerade den negativen Winkel von $z$.

* **Man kann die Konjugation in der komplexen Zahlenebene also als die Spiegelung an der reellen Achse interpretieren**.

* <font color="blue">**Insbesondere werden unter der Konjugation genau die reellen Zahlen auf sich selbst abgebildet**.

Das Produkt aus einer komplexen Zahl $z=a+b$ i und ihrer komplex Konjugierten $\bar{z}$ ergibt das Quadrat ihres Betrages:

> $
z \cdot \bar{z}=(a+b i)(a-b i)=a^{2}+b^{2}=|z|^{2}
$

Die komplexen Zahlen bilden damit ein triviales Beispiel einer [C*-Algebra](https://de.m.wikipedia.org/wiki/C*-Algebra).

**Step 4**: Apply inverse $QFT^{\dagger}$ on the $|x\rangle$ register

* $QFT\,\,|x\rangle=|\tilde{x}\rangle=$ $\frac{1}{\sqrt{N}} \sum_{y=0}^{N-1} e^{\frac{2 \pi i}{N} x y} |y\rangle$ (Reminder!)

* $QFT^{\dagger}|\tilde{x}\rangle=|x\rangle=$ $\frac{1}{\sqrt{N}} \sum_{y=0}^{N-1} e^{\frac{-2 \pi i}{N} x y} |y\rangle$ (see -2 turning i in -i which is a **complex conjugate operation**)

* We want to know what QFT dagger is doing to (it is $\frac{1}{\sqrt{16}}$ because we have 4 Qubits)

  * $QFT^{\dagger}|3\rangle_4$ = $\frac{1}{\sqrt{16}} \sum_{y=0}^{N-1} e^{\frac{-2 \pi i 3 y}{16}}|y\rangle$

  * $QFT^{\dagger}|7\rangle_4$ = $\frac{1}{\sqrt{16}} \sum_{y=0}^{N-1} e^{\frac{-2 \pi i 7 y}{16}}|y\rangle$

  * $QFT^{\dagger}|11\rangle_4$ = $\frac{1}{\sqrt{16}} \sum_{y=0}^{N-1} e^{\frac{-2 \pi i 11 y}{16}}|y\rangle$

  * $QFT^{\dagger}|15\rangle_4$ = $\frac{1}{\sqrt{16}} \sum_{y=0}^{N-1} e^{\frac{-2 \pi i 15 y}{16}}|y\rangle$

Alltogether:

  * $QFT^{\dagger}|x\rangle$ = $\frac{1}{{8}} \sum_{y=0}^{15}$ [ $e^{-i\frac{ 3 \pi}{8}y}$ + $e^{-i\frac{ 7 \pi}{8}y}$ + $e^{-i\frac{ 11 \pi}{8}y}$ + $e^{-i\frac{ 15 \pi}{8}y}$] $|y\rangle$

    * with: $e^{-i\frac{ 3 \pi}{8}y}$ = $\cos \left(\frac{3 \pi}{8} y\right)-i \sin \left(\frac{3 \pi}{8} y\right)$ (und aquivalent fur alle anderen drei)

    * siehe coding rechnung unten was genau passiert hier!

  * <font color="blue">$QFT^{\dagger}|x\rangle$ = $\frac{1}{{8}}$ [ $4|0\rangle_4$ + $4i|4\rangle_4$ $-4|8\rangle_4$ $-4i|12\rangle_4$ ]</font>

  * Remember we had a sum before: $\frac{1}{{8}} \sum_{y=0}^{15}$. And notice how all the other terms now vanished to zero, because you had equal contributions of plus and minus.

    * **This is exactly what it means when people tell you that quantum computers take advantage of interference!! = when a lot of the terms vanish, and the answer only converges to the terms that we care about.**

    * here is the calculation what happened, you see many zeros:

<font color="red">Hier Beispielrechnung fur y=1, um vanishing components zu verstehen</font>. Unten im Code die Ergebnisse, zum Beispiel fur y=1 als Ergebnis = 0, $QFT^{\dagger}|x\rangle$ fur y = 1:

  * $e^{-i\frac{ 3 \pi}{8}y}$ + $e^{-i\frac{ 7 \pi}{8}y}$ + $e^{-i\frac{ 11 \pi}{8}y}$ + $e^{-i\frac{ 15 \pi}{8}y}$ =

  * $e^{-i\frac{ 3 \pi}{8}1}$ + $e^{-i\frac{ 7 \pi}{8}1}$ + $e^{-i\frac{ 11 \pi}{8}1}$ + $e^{-i\frac{ 15 \pi}{8}1}$ =

    * $e^{-i\frac{ 3 \pi}{8}1}$ = <font color="green">0,382683432 - 0,923879533 i</font>

    * $e^{-i\frac{ 7 \pi}{8}1}$ = <font color="orange">-0,923879533 - 0,382683432 i</font>

    * $e^{-i\frac{ 11 \pi}{8}1}$ = <font color="green">-0,382683432 + 0,923879533 i</font>

    * $e^{-i\frac{ 15 \pi}{8}1}$ = <font color="orange">0,923879533 + 0,382683432 i</font>

  * Wie man sieht canceln sich die Terme aus (in gleicher Farbe), weshalb als Ergebnis fur y=1 Null entsteht.

In [None]:
# Hier Beispiel fur y=1 und den ersten e-Term:
y = 1
pi = np.pi
coeff = np.exp(-1j*3*pi/8 * y)
if abs(coeff) < 1e-10: coeff= 0
print(y, coeff)

In [None]:
# Hier die komplette Rechnung fur alle y und alle 4 e-Terme:
import numpy as np

pi = np.pi
for y in range (15) :
  coeff = np.exp(-1j*3*pi/8 * y) + \
          np.exp(-1j*7*pi/8 * y) + \
          np.exp(-1j*11*pi/8* y) + \
          np.exp(-1j*15*pi/8* y)
  if abs(coeff) < 1e-10: coeff= 0
  print(y, coeff)

**Step 5: Measure the |x> register**

* You get either 0 or 4 or 8 or 12 with equal probability

* Remaining steps are classical post-processing

* You can already see the periodicity in the result: the difference is always 4

* Analyse what happens for each outcome: **The measurement results peak near $j\frac{N}{r}$ for same integer j $\in Z$. And r is the period that we are looking for. N = $2^n$ Qubits!**

  * if we measure |4>$_4$: $j\frac{16}{r}$ = 4, true if j=1 and r=4

  * there are multiple values that would work, but this is the lowest one

* now check our protocoll for r=4:

  * Is r even? yes!

  * $x \equiv a^{r / 2}(\bmod N)$ = $13^{4 / 2}(\bmod 15)$ = 4

  * x+1 = 5 and x-1 = 3

* This looks good, now check:

  * $\operatorname{gcd}(x+1, N)=\operatorname{gcd}(5,15)=5$

  * $\operatorname{gcd}(x-1, N)=\operatorname{gcd}(3,15)=3$

What do you do if r = 8 ?

* |8>$_4$: $j\frac{16}{r}$ = 8, true if j=1 and r=2 AND j=2 and r=4

* if r=4 we are back in the case before

* if r=2 then $x \equiv a^{r / 2}(\bmod N)$ = $13^{2 / 2}(\bmod 15)$ = 2, which brings x+1 = 3 and x-1 = 1

  * $\operatorname{gcd}(x+1, N)=\operatorname{gcd}(3,15)=3$

  * $\operatorname{gcd}(x-1, N)=\operatorname{gcd}(1,15)=1$

* This leads you to a partial solution. Now you can back out the other solution, with checking 3 divides into 15

* If we get r=0, then we need to do the experiment again

Hier die Faktorisierungsergebnisse fur verschiedene QC-Ausgaben r. Mit r=0 geht es nicht, also kann man in 3 von 4 Faellen faktorisieren (und mit r=8 bekommt man eine partial solution, kann aber immer noch faktorisieren).

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_097.png)

Aus dem 2001 Paper von IBM, Faktorisierung von 15 auf einem Quantum Computer:

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_096.png)

**Appendix: What is the Gate structure in $U$?**

* $a^{x_1}$, $a^{x_2}$, $a^{x_n}$ tells you this is a controlled operation

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_098.png)

* look now how the exponent doesn't contain $x_1$, $x_2$, .. $x_n$ anymore

* this is done by implementing it by doing these controls

* this is exactly like quantum phase estimation

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_099.png)

**Der linke Term stammt aus QPE, der rechte Term ist der Teil $U$ aus Shor's Algorithms:**

> <font color="blue">$U^{2^{x}}=a^{2^{x}}(\bmod N)$</font>

continue: https://youtu.be/IFmkzWF-S2k?t=1181

###### *Grover Search*

https://web.archive.org/web/20171221161408/http://twistedoakstudios.com/blog/Post2644_grovers-quantum-search-algorithm

https://www.lesswrong.com/posts/5vZD32EynD9n94dhr/configurations-and-amplitude

> **[Grover’s algorithm](https://en.m.wikipedia.org/wiki/Grover's_algorithm) teaches us how we can search for an item in an unsorted list without needing to look at each item one by one but by looking at them all at once.**

It accomplishes that using two techniques:

  * First, it uses a quantum oracle to mark the searched state.

  * Second, it uses a diffuser that amplifies the amplitude of the marked state to increase its measurement probability.

Grover's Algorithm : Suche in grossen Datenbanken (Squared speedup: get result in the square root of time that on classical computers)

* Auf einem klassischen Computer ist der prinzipiell schnellstmögliche Suchalgorithmus in einer unsortierten Datenbank die [lineare Suche](https://de.m.wikipedia.org/wiki/Lineare_Suche), die ${\mathcal {O}}\left(N\right)$ Rechenschritte erfordert (Der Suchaufwand wächst linear mit der Anzahl der Elemente in der Liste.)

* Die effizientere [Binäre Suche](https://de.m.wikipedia.org/wiki/Binäre_Suche) kann nur bei geordneten Listen benutzt werden. Die Binäre Suche ist deutlich schneller als die lineare Suche, welche allerdings den Vorteil hat, auch in unsortierten Feldern zu funktionieren. In Spezialfällen kann die [Interpolationssuche](https://de.m.wikipedia.org/wiki/Interpolationssuche) schneller sein als die binäre Suche.

* Makes use of Amplitude Amplification, Quantum Walk & Quantum Counting

> Grover's Algorithm uses a phase shift to increase the amplitude of the favorable state and to decrease the amplitudes of all other states (=Phase Flip of Desired Outcome + Probability Amplitudes Inversion about the Mean to Amplify)

Grover’s algorithm solves oracles that **add a negative phase to the solution states**. I.e. for any state |x⟩ in the computational basis:

> $U_{\omega}|x\rangle=\left\{\begin{aligned}|x\rangle & \text { if } x \neq \omega \\-|x\rangle & \text { if } x=\omega \end{aligned}\right.$

We create a function $f$ that takes a proposed solution $x$, and returns

* $f(x)=0$ if $x$ is not a solution ( $x \neq \omega)$

* $f(x)=1$ for a valid solution $(x=\omega)$.

The oracle can then be described as:

> $U_{\omega}|x\rangle=(-1)^{f(x)}|x\rangle$

* you can see this is an Eigenvalue equation

The oracle's matrix will be a diagonal matrix of the form:

> $U_{\omega}=\left[\begin{array}{cccc}(-1)^{f(0)} & 0 & \cdots & 0 \\ 0 & (-1)^{f(1)} & \cdots & 0 \\ \vdots & 0 & \ddots & \vdots \\ 0 & 0 & \cdots & (-1)^{f\left(2^{n}-1\right)}\end{array}\right]$

*Source: https://qiskit.org/textbook/ch-algorithms/grover.html*

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_105.jpg)

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_106.jpg)

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_107.jpg)

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_108.jpg)

*Step 1: Hadamard-Operator for Superposition + Assign a Phase -1 to the desired outcome*

Start with a balanced superposition, and assign a phase of -1 to the chosen ket, 111). Assigning -1 means applying a Pauli-Z-operator in the superposition to this one !!

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_102.jpg)

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_101.jpg)

**<font color="blue">More about Amplitude Amplification**

* [Amplitude amplification](https://en.m.wikipedia.org/wiki/Amplitude_amplification) is a technique in quantum computing which **generalizes the idea behind the Grover's search algorithm**, and gives rise to a family of quantum algorithms.

* In a quantum computer, amplitude amplification can be used to **obtain a quadratic speedup over several classical algorithms**.

1. If there are $G$ good entries in the database in total, then we can find them by initializing a quantum register $|\psi\rangle$ with $n$ qubits where $2^{n}=N$ into a uniform superposition of all the database elements $N$ such that

>$| \psi \rangle=\frac{1}{\sqrt{N}} \sum_{k=0}^{N-1}|k\rangle
$

2. and running the above algorithm. In this case the overlap of the initial state with the good subspace is equal to the square root of the frequency of the good entries in the database, $\sin (\theta)=|P| \psi\rangle \mid=\sqrt{G / N}$. If $\sin (\theta) \ll 1$,

3. we can
approximate the number of required iterations as

>$
n=\left\lfloor\frac{\pi}{4 \theta}\right\rfloor \approx\left\lfloor\frac{\pi}{4 \sin (\theta)}\right\rfloor=\left\lfloor\frac{\pi}{4} \sqrt{\frac{N}{G}}\right\rfloor=O(\sqrt{N})
$

Measuring the state will now give one of the good entries with high probability.

Since each application of $S_{P}$ requires a single oracle query (assuming that the oracle is implemented as a quantum gate), we can find a good entry with just $O(\sqrt{N})$ oracle queries, thus obtaining a quadratic speedup over the best possible classical algorithm. (The classical method for searching the database would be to perform the query for every $e \in\{0,1, \ldots, N-1\}$ until a solution is found, thus costing $O(N)$ queries.) Moreover, we can find all $G$ solutions using $O(\sqrt{G N})$ queries.

* Now, let’s say four qubits are enough and Mr. Grover is known as |0010⟩. The oracle uses the specific characteristic of this state to identifying it. That is the state has a |1⟩ at the third position and |0⟩ otherwise.

* Since the quantum oracle takes all qubits as input, it can easily apply a transformation of this exact state. It doesn’t matter whether we use four qubits or 33. The oracle identifies Mr. Grover in a single turn.

* The transformation the oracle applies to the searched state is an inversion of the amplitude.

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_109.png)

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_110.png)

* In the representation above representation (the dark one) of the amplitudes, we can clearly see a difference between the searched state and all the other states. We could prematurely declare the search is over.

* The only difference is in the sign of the amplitude. For the measurement probability results from the amplitude’ absolute square, the sign does not matter at all.

* The amplitude originates from the concept that every quantum entity may be described not only as a particle but also as a wave. The main characteristic of a wave is that it goes up and down as it moves. The amplitude is the distance between the center and the crest of the wave.

* If we invert the amplitude of a wave at all positions, the result is the same wave shifted by half of its wavelength.

* These two waves differ only in their relative position. This is the phase of the wave. For the outside world, the phase of a wave is not observable. Observed individually, the two waves appear identical. So, the problem is we can’t tell the difference between these two waves.

> As a consequence, the system does not appear any different from the outside. Even though the oracle marked the searched state and it, therefore, differs from the other states, all states still have the same measurement probability.

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_111.png)


*Step 2: Invert all probability amplitudes about the mean + Measure*

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_103.jpg)

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_103.jpg)

* We need to turn the difference into something measurable. We need to increase the measurement probability of the marked state. This is the task of the diffuser. The diffuser applies an inversion about the mean amplitude.

* Let’s have a look at the average amplitude.

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_112.png)

* With four qubits, we have 16 different states. Each state has an amplitude of 1/sqrt(16)=1/4. In fact, each but one state — the searched state has this amplitude. The searched state has an amplitude of −1/4. Thus, the average is (15∗1/4−1/4)/16=0.21875.

* The average is a little less than the amplitude of all states we did not mark. If we invert these amplitudes by this mean, they end up a little lower than the average at 0.1875.

* For the amplitude of the marked state is negative, it is quite far away from the average. The inversion about the mean has a greater effect. It flips the amplitude from −0.25 by 2∗(0.25+0.21875) to 0.6875 (bzw: =0,21875-(-0,25-0,21875)).

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_113.png)

* The inversion about the mean works well if we search for a single or a few negative amplitudes among many positive amplitudes. Then, this operation increases the negative amplitudes we know are the correct ones. And this operation decreases the positive amplitudes, we know are wrong.

> This operation increases the negative amplitude by a large amount while decreasing the positive amplitudes by a small amount.

* But the more states we have, the lower the overall effect will be. In our example, we calculated the new amplitude of the searched state as 0.6875. The corresponding measurement probability is 0.6875^2=0.47265625. Accordingly, we measure this system only about every other time in the state we are looking for. Otherwise, we measure it in any other case.

* Of course, we could now measure the system many times and see our searched state as the most probable one. But running the algorithm so many times would give away any advantage we gained from not searching all the states.

> **Instead, we repeat the algorithm. We use the same oracle to negate the amplitude of the searched state. Then we invert all the amplitudes around the mean, again**.

However, we must not repeat this process too many times. There is an optimal number of times of repeating this process to get the greatest chance of measuring the correct answer.

* The probability of obtaining the correct result grows until we reach about π/4*sqrt(N) with N is the number of states of the quantum system. Beyond this number, the probability of measuring the correct result decreases again.

* In our example with four qubits and N=16 states, the optimum number of iterations is 3.

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_104.jpg)

https://towardsdatascience.com/towards-understanding-grovers-search-algorithm-2cdc4e885660

*Grover Search: Simple Examples (Z-Gate and I-Gate as amplifiers. with H as Diffuser)*

**Example 1: Z Gate to switch from 0 to 1**

* Let’s say the state |1⟩ depicts the favorable state we want to find. Then, the oracle consists of the Z-gate that switches the amplitude when the corresponding qubit is in state |1⟩.

* As a result, we see the amplitude changed for state |1⟩. The qubit is now in state |−⟩. Its two states |0⟩ and |1⟩ are in two different phases, now.

* In other words, we flipped the amplitude of state |1⟩ from positive to negative.

* Both states still have a measurement probability of 0.5. It is the task of the diffuser to magnify the amplitude to favor the searched state.

* **The diffuser in a single-qubit circuit is quite simple. It is another H-gate**. This circuit results in state |1⟩ with absolute certainty.

We apply an important sequence on the qubit, the HZH-circuit. This circuit is known as an identity to the NOT-gate (X-gate) that turns state |0⟩ into |1⟩ and vice versa.

The following equation proves this identity.

> $H Z H=\frac{1}{\sqrt{2}}\left[\begin{array}{cc}1 & 1 \\ 1 & -1\end{array}\right]\left[\begin{array}{cc}1 & 0 \\ 0 & -1\end{array}\right] \frac{1}{\sqrt{2}}\left[\begin{array}{cc}1 & 1 \\ 1 & -1\end{array}\right]=\left[\begin{array}{ll}0 & 1 \\ 1 & 0\end{array}\right]=X$

Then, why would we use the HZH-sequence? If it is similar to the NOT-gate, why don’t we use that instead?

<font color="red">Simply put, the HZH-sequence is more flexible. It is the simplest form of Grover’s search algorithm.</font> It starts with all states being equal (the first H-gate). It applies an oracle (Z-gate).

And, <font color="blue">**it uses a diffuser that amplifies the amplitude of the selected state |1⟩ (the second H-gate)**.

> While we could rewrite these two circuits more succinctly, the circuit identities of HZH=X and HIH=I let us use the general structure of Grover’s algorithm. Simply by changing the oracle, we can mark and amplify different states. We don’t need to come up with a new algorithm for each possible state we want to select out of a list. But we only need to find an appropriate oracle. This ability comes in handy the more states our quantum system has.

> The search for one of two possible states does not even deserve to be called a search. But the general structure of Grover’s algorithm is not different from this very simple example. **It uses a phase shift to increase the amplitude of the favorable state and to decrease the amplitudes of all other states**.



**Quantum Counting**

> **In quantum counting, we simply use the quantum phase estimation algorithm to find an eigenvalue of a Grover search iteration.**

* how many solutions exist?

* does there any solution exist?


The percentage number of solutions in our search space affects the difference between $|s\rangle$ and $\left|s^{\prime}\right\rangle$. For example, if there are not many solutions, $|s\rangle$ will be very close to $\left|s^{\prime}\right\rangle$ and $\theta$ will be very small. It turns out that the eigenvalues of the Grover iterator are $e^{\pm i \theta}$, and **we can extract this using quantum phase estimation (QPE) to estimate the number of solutions $(M)$**.

**First we want to get $\theta$ from measured_int. (phase estimation)**

* You will remember that $\mathrm{QPE}$ gives us a measured value $=2^{n} \phi$ from the eigenvalue $e^{2 \pi i \phi}$,

* so to get $\theta$ we need to do:

> $
\theta=\text { value } \times \frac{2 \pi}{2^{t}}
$

**Second, we calculate the inner product of $|s\rangle$ and $\left|s^{\prime}\right\rangle:$**

> $
\left\langle s^{\prime} \mid s\right\rangle=\cos \frac{\theta}{2}
$

* And that $|s\rangle$ (a uniform superposition of computational basis states) can be written in terms of $|\omega\rangle$ and $\left|s^{\prime}\right\rangle$ as:

> $
|s\rangle=\sqrt{\frac{M}{N}}|\omega\rangle+\sqrt{\frac{N-M}{N}}\left|s^{\prime}\right\rangle
$

* The inner product of $|s\rangle$ and $\left|s^{\prime}\right\rangle$ is:

> $
\left\langle s^{\prime} \mid s\right\rangle=\sqrt{\frac{N-M}{N}}=\cos \frac{\theta}{2}$

* From this, we can use some trigonometry and algebra to show:

> $
N \sin ^{2} \frac{\theta}{2}=M
$

**Third, calculate number of solutions**

* From the Grover's algorithm chapter, you will remember that a common way to create a diffusion operator, $U_{s}$, is actually to implement $-U_{s}$.

* This implementation is used in the Grover iteration provided in this chapter. In a normal Grover search, this phase is global and can be ignored, but now we are controlling our Grover iterations, this phase does have an effect.

* The result is that we have effectively searched for the states that are not solutions, and our quantum counting algorithm will tell us how mâny states are not solutions. To fix this, we simply calculate

> $N-M$

The ability to perform quantum counting efficiently is needed in order to use Grover's search algorithm (because running Grover's search algorithm requires knowing how many solutions exist). Moreover, this algorithm solves the quantum existence problem (namely, deciding whether any solution exists) as a special case.

###### *Harrow-Hassidim-Lloyd Algorithm (HHL)*

https://medium.com/mit-6-s089-intro-to-quantum-computing/using-quantum-algorithms-to-compute-discrete-logs-bad2c94f6df6

https://medium.com/mit-6-s089-intro-to-quantum-computing/quantum-neural-networks-7b5bc469d984

https://medium.com/mit-6-s089-intro-to-quantum-computing/hhl-solving-linear-systems-of-equations-with-quantum-computing-efb07eb32f74

<font color="blue">**Solving a system of linear equations with a quantum computer (HHL)**

* **Given a matrix $A \in \mathbb{C}^{N \times N}$ and a vector $\vec{b} \in \mathbb{C}^{N}$, find $\vec{x} \in \mathbb{C}^{N}$ satisfying $A \vec{x}=\vec{b}$**

* The spectrum of $A$ is given by: $A\left|v_{j}\right\rangle=\lambda_{j}\left|v_{j}\right\rangle, 1 \geq\left|\lambda_{j}\right| \geq 1 / \kappa$


**Objective: We want to solve a system of linear equations by finding $\vec{x}$**

* Familiar methods of solutions: Substitution method, Graphical method, Matrix method, Cramer's rule, Gaussian elimination

* The classical algorithm returns the full solution, while the HHL can only approximate functions of the solution vector.


> $A \vec{x} = \vec{b}$

Classically you would take the inverse of $A$ (via spectral decomposition / eigendecomposition):

> $\vec{x} = A^{-1} \vec{b}$

The first step towards solving a system of linear equations with a quantum computer is to encode the problem in the quantum language.

* By rescaling the system, we can assume $\vec{b}$ and $\vec{x}$ to be normalised and map them to the respective quantum states $|b\rangle$ and $|x\rangle$.

* Usually the mapping used is such that $i^{\text {th }}$ component of $\vec{b}$ (resp. $\vec{x}$ ) corresponds to the amplitude of the $i^{\text {th }}$ basis state of the quantum state $|b\rangle$ (resp. $|x\rangle$ ).

From now on, we will focus on the rescaled problem

><font color="blue">$A|x\rangle=|b\rangle
$</font> $\quad$ (System of linear equations in a quantum state)

And we want to find this:

><font color="blue">$|x\rangle=A^{-1}|b\rangle$</font> $\quad$ (the solution is: $|x\rangle = \sum_{j=0}^{N-1} \lambda_{j}^{-1} b_{j}\left|u_{j}\right\rangle$)

We need to find the inverse matrix $A^{-1}$. We can get the matrix inverse via eigendecomposition. Since $A$ is Hermitian (normal!), it has a spectral decomposition:

>$
A=\sum_{j=0}^{N-1} \lambda_{j}\left|u_{j}\right\rangle\left\langle u_{j}\right|, \quad \lambda_{j} \in \mathbb{R}
$

where $\left|u_{j}\right\rangle$ is the $j^{t h}$ eigenvector of $A$ with respective eigenvalue $\lambda_{j}$. Then,

>$
A^{-1}=\sum_{j=0}^{N-1} \lambda_{j}^{-1}\left|u_{j}\right\rangle\left\langle u_{j}\right|
$

and the right hand side of the system can be written in the eigenbasis of $A$ as

>$
|b\rangle=\sum_{j=0}^{N-1} b_{j}\left|u_{j}\right\rangle, \quad b_{j} \in \mathbb{C}
$

It is useful to keep in mind that the goal of the HHL is to exit the algorithm with the readout register in the state

>$
|x\rangle=A^{-1}|b\rangle=\sum_{j=0}^{N-1} \lambda_{j}^{-1} b_{j}\left|u_{j}\right\rangle
$

Note that here we already have an implicit normalisation constant since we are talking about a quantum state.

**HHL-Algorithm**

*Main Subroutines in HHL: Hamiltonian simulation, Phase estimation (newer: linear combination of unitaries) and (Variable-time) amplitude amplification*

1. Prepare the initial state $|b\rangle$. Note that $|b\rangle=\sum_{j} c_{j}\left|v_{j}\right\rangle$.

2. Use the so-called phase estimation algorithm to perform the map
$|b\rangle \rightarrow \sum_{j} c_{j}\left|v_{j}\right\rangle\left|\tilde{\lambda}_{j}\right\rangle$

* $|\tilde{\lambda}_{j}\rangle$ -> This register contains the eigenvalue estimates.

3. Apply a one-qubit conditional rotation to perform the map
$|0\rangle \rightarrow \frac{1}{\kappa \tilde{\lambda}_{j}}|0\rangle+\sqrt{1-\frac{1}{\kappa^{2} \tilde{\lambda}_{j}^{2}}}|1\rangle$

4. Undo step 2 - apply the inverse of phase estimation
$\sum_{j} \frac{c_{j}}{\kappa \tilde{\lambda}_{j}}\left|v_{j}\right\rangle|0\rangle+|\mathrm{bad}\rangle|1\rangle \approx \frac{1}{\kappa A}|b\rangle|0\rangle+|\mathrm{bad}\rangle|1\rangle$

5. Use amplitude amplification to get rid of the „bad“ part of the state with |1>


**Applications of HHL**

* Systems of linear equations arise naturally in many real-life applications in a wide range of areas, such as in the solution of Partial Differential Equations, the calibration of financial models, fluid simulation or numerical field calculation.

* Used in many quantum machine learning algorithms as a building block


* The quantum algorithm for linear systems of equations has been applied to a support vector machine, which is an optimized linear or non-linear binary classifier (https://arxiv.org/abs/1307.0471v2)

* for Least-squares fitting (https://arxiv.org/abs/1204.5242)

* for finite-element-methods (https://arxiv.org/abs/1512.05903) (but only for higher problems which include solutions with higher-order derivatives and large spatial dimensions. For example, problems in many-body dynamics require the solution of equations containing derivatives on orders scaling with the number of bodies, and some problems in computational finance, such as Black-Scholes models, require large spatial dimensions)

**Promise**:

* Solving 10,000 linear equation: a classical computer needs in best case 10,000 steps. HHL just 13. The [quantum algorithm for linear systems of equations](https://en.m.wikipedia.org/wiki/Quantum_algorithm_for_linear_systems_of_equations) designed by Aram Harrow, Avinatan Hassidim, and Seth Lloyd: Provided the linear system is sparse and has a low condition number $\kappa_{1}$ and that the user is interested in the result of a scalar measurement on the solution vector, instead of the values of the solution vector itself, then the algorithm has a runtime of $O\left(\log (N) \kappa^{2}\right)$, where $N$ is the number of variables in the linear system. This offers an exponential speedup over the fastest classical algorithm, which runs in $O(N \kappa)$ (or $O(N \sqrt{\kappa})$ for positive semidefinite matrices).

* Unlike the classical solutions to the Deutsch-Jozsa and search problems, most of our classical methods for matrix manipulation do work in polynomial time. However, as data analysis becomes more and more powerful (and more and more demanding on today’s computers), the size of these matrices can make even polynomial time too long.

**Disadvantages:**

* solution vector is not yielded (rather it prepares a quantum state that is proportional to the solution): Actually reading out the solution vector would take O(N)time, so we can only maintain the logarithmic runtime by sampling the solution vector like ⟨x|M|x⟩, where M is a quantum-mechanical operator. Therefore, **HHL is useful mainly in applications where only samples from the solution vector are needed**.

* Entries of matrix have to be sparse: Additionally, although HHL is exponentially faster than Conjugate Gradient in N, it is polynomially slower in s and 𝜅, so HHL is restricted to only those matrices that are sparse and have low condition numbers.

* Must satisfy robust invertibility (means that entries of matrix must all approx. of same size)

* Preparation of input vector is complicated


**Sources:**

* https://en.m.wikipedia.org/wiki/System_of_linear_equations#Matrix_solution

* [Peter Witteg: Quantum Machine Learning - 37 - Overview of the HHL Algorithm](https://m.youtube.com/watch?v=hQpdPM-6wtU)

* https://qiskit.org/textbook/ch-applications/hhl_tutorial.html

* https://github.com/quantumlib/Cirq/blob/master/examples/hhl.py

* [Google Quantum: Quantum Algorithms for Systems of Linear Equations (Quantum Summer Symposium 2020)](https://m.youtube.com/watch?v=Xvp56xeNZo4)

* https://www.quantamagazine.org/new-quantum-algorithms-finally-crack-nonlinear-equations-20210105/

* [A New Approach to Multiplication Opens the Door to Better Quantum Computers](https://www.quantamagazine.org/a-new-approach-to-multiplication-opens-the-door-to-better-quantum-computers-20190424/)

* https://www.quantamagazine.org/new-algorithm-breaks-speed-limit-for-solving-linear-equations-20210308/

* https://www.quantamagazine.org/teenager-finds-classical-alternative-to-quantum-recommendation-algorithm-20180731/

* https://www.quantamagazine.org/mathematicians-inch-closer-to-matrix-multiplication-goal-20210323/

* https://www.quantamagazine.org/a-new-approach-to-multiplication-opens-the-door-to-better-quantum-computers-20190424/

In [None]:
# Classical result
from numpy import linalg as LA
w, v = LA.eigh(A)
w

In [None]:
# pylint: disable=wrong-or-nonexistent-copyright-notice
"""Demonstrates the algorithm for solving linear systems by Harrow, Hassidim, Lloyd (HHL).

The HHL algorithm solves a system of linear equations, specifically equations of the form Ax = b,
where A is a Hermitian matrix, b is a known vector, and x is the unknown vector. To solve on a
quantum system, b must be rescaled to have magnitude 1, and the equation becomes:

|x> = A**-1 |b> / || A**-1 |b> ||

The algorithm uses 3 sets of qubits: a single ancilla qubit, a register (to store eigenvalues of
A), and memory qubits (to store |b> and |x>). The following are performed in order:
1) Quantum phase estimation to extract eigenvalues of A
2) Controlled rotations of ancilla qubit
3) Uncomputation with inverse quantum phase estimation

For details about the algorithm, please refer to papers in the REFERENCE section below. The
following description uses variables defined in the HHL paper.

This example is an implementation of the HHL algorithm for arbitrary 2x2 Hermitian matrices. The
output of the algorithm are the expectation values of Pauli observables of |x>. Note that the
accuracy of the result depends on the following factors:
* Register size
* Choice of parameters C and t

The result is perfect if
* Each eigenvalue of the matrix is in the form

  2π/t * k/N,

  where 0≤k<N, and N=2^n, where n is the register size. In other words, k is a value that can be
  represented exactly by the register.
* C ≤ 2π/t * 1/N, the smallest eigenvalue that can be stored in the circuit.

The result is good if the register size is large enough such that for every pair of eigenvalues,
the ratio can be approximated by a pair of possible register values. Let s be the scaling factor
from possible register values to eigenvalues. One way to set t is

t = 2π/(sN)

For arbitrary matrices, because properties of their eigenvalues are typically unknown, parameters C
and t are fine-tuned based on their condition number.


=== REFERENCE ===
Harrow, Aram W. et al. Quantum algorithm for solving linear systems of
equations (the HHL paper)
https://arxiv.org/abs/0811.3171

Coles, Eidenbenz et al. Quantum Algorithm Implementations for Beginners
https://arxiv.org/abs/1804.03719

=== CIRCUIT ===
Example of circuit with 2 register qubits.

(0, 0): ─────────────────────────Ry(θ₄)─Ry(θ₁)─Ry(θ₂)─Ry(θ₃)──────────────M──
                     ┌──────┐    │      │      │      │ ┌───┐
(1, 0): ─H─@─────────│      │──X─@──────@────X─@──────@─│   │─────────@─H────
           │         │QFT^-1│    │      │      │      │ │QFT│         │
(2, 0): ─H─┼─────@───│      │──X─@────X─@────X─@────X─@─│   │─@───────┼─H────
           │     │   └──────┘                           └───┘ │       │
(3, 0): ───e^iAt─e^2iAt───────────────────────────────────────e^-2iAt─e^-iAt─

Note: QFT in the above diagram omits swaps, which are included implicitly by
reversing qubit order for phase kickbacks.
"""

import math
import numpy as np
import sympy
import cirq


class PhaseEstimation(cirq.Gate):
    """A gate for Quantum Phase Estimation.

    The last qubit stores the eigenvector; all other qubits store the estimated phase,
    in big-endian.

    Args:
        num_qubits: The number of qubits of the unitary.
        unitary: The unitary gate whose phases will be estimated.
    """

    def __init__(self, num_qubits, unitary):
        self._num_qubits = num_qubits
        self.U = unitary

    def num_qubits(self):
        return self._num_qubits

    def _decompose_(self, qubits):
        qubits = list(qubits)
        yield cirq.H.on_each(*qubits[:-1])
        yield PhaseKickback(self.num_qubits(), self.U)(*qubits)
        yield cirq.qft(*qubits[:-1], without_reverse=True) ** -1


class HamiltonianSimulation(cirq.EigenGate):
    """A gate that represents e^iAt.

    This EigenGate + np.linalg.eigh() implementation is used here purely for demonstrative
    purposes. If a large matrix is used, the circuit should implement actual Hamiltonian
    simulation, by using the linear operators framework in Cirq, for example.
    """

    def __init__(self, A, t, exponent=1.0):
        cirq.EigenGate.__init__(self, exponent=exponent)
        self.A = A
        self.t = t
        ws, vs = np.linalg.eigh(A)
        self.eigen_components = []
        for w, v in zip(ws, vs.T):
            theta = w * t / math.pi
            P = np.outer(v, np.conj(v))
            self.eigen_components.append((theta, P))

    def _num_qubits_(self) -> int:
        return 1

    def _with_exponent(self, exponent):
        return HamiltonianSimulation(self.A, self.t, exponent)

    def _eigen_components(self):
        return self.eigen_components


class PhaseKickback(cirq.Gate):
    """A gate for the phase kickback stage of Quantum Phase Estimation.

    It consists of a series of controlled e^iAt gates with the memory qubit as the target and
    each register qubit as the control, raised to the power of 2 based on the qubit index.
    unitary is the unitary gate whose phases will be estimated.
    """

    def __init__(self, num_qubits, unitary):
        super(PhaseKickback, self)
        self._num_qubits = num_qubits
        self.U = unitary

    def num_qubits(self):
        return self._num_qubits

    def _decompose_(self, qubits):
        qubits = list(qubits)
        memory = qubits.pop()
        for i, qubit in enumerate(qubits):
            yield cirq.ControlledGate(self.U ** (2**i))(qubit, memory)


class EigenRotation(cirq.Gate):
    """Perform a rotation on an ancilla equivalent to division by eigenvalues of a matrix.

    EigenRotation performs the set of rotation on the ancilla qubit equivalent to division on the
    memory register by each eigenvalue of the matrix. The last qubit is the ancilla qubit; all
    remaining qubits are the register, assumed to be big-endian.

    It consists of a controlled ancilla qubit rotation for each possible value that can be
    represented by the register. Each rotation is a Ry gate where the angle is calculated from
    the eigenvalue corresponding to the register value, up to a normalization factor C.
    """

    def __init__(self, num_qubits, C, t):
        super(EigenRotation, self)
        self._num_qubits = num_qubits
        self.C = C
        self.t = t
        self.N = 2 ** (num_qubits - 1)

    def num_qubits(self):
        return self._num_qubits

    def _decompose_(self, qubits):
        for k in range(self.N):
            kGate = self._ancilla_rotation(k)

            # xor's 1 bits correspond to X gate positions.
            xor = k ^ (k - 1)

            for q in qubits[-2::-1]:
                # Place X gates
                if xor % 2 == 1:
                    yield cirq.X(q)
                xor >>= 1

                # Build controlled ancilla rotation
                kGate = cirq.ControlledGate(kGate)

            yield kGate(*qubits)

    def _ancilla_rotation(self, k):
        if k == 0:
            k = self.N
        theta = 2 * math.asin(self.C * self.N * self.t / (2 * math.pi * k))
        return cirq.ry(theta)


def hhl_circuit(A, C, t, register_size, *input_prep_gates):
    """Constructs the HHL circuit.

    Args:
        A: The input Hermitian matrix.
        C: Algorithm parameter, see above.
        t: Algorithm parameter, see above.
        register_size: The size of the eigenvalue register.
        *input_prep_gates: A list of gates to be applied to |0> to generate the desired input
            state |b>.

    Returns:
        The HHL circuit. The ancilla measurement has key 'a' and the memory measurement is in key
        'm'.  There are two parameters in the circuit, `exponent` and `phase_exponent` corresponding
        to a possible rotation  applied before the measurement on the memory with a
        `cirq.PhasedXPowGate`.
    """

    ancilla = cirq.LineQubit(0)
    # to store eigenvalues of the matrix
    register = [cirq.LineQubit(i + 1) for i in range(register_size)]
    # to store input and output vectors
    memory = cirq.LineQubit(register_size + 1)

    c = cirq.Circuit()
    hs = HamiltonianSimulation(A, t)
    pe = PhaseEstimation(register_size + 1, hs)
    c.append([gate(memory) for gate in input_prep_gates])
    c.append(
        [
            pe(*(register + [memory])),
            EigenRotation(register_size + 1, C, t)(*(register + [ancilla])),
            pe(*(register + [memory])) ** -1,
            cirq.measure(ancilla, key='a'),
        ]
    )

    c.append(
        [
            cirq.PhasedXPowGate(
                exponent=sympy.Symbol('exponent'), phase_exponent=sympy.Symbol('phase_exponent')
            )(memory),
            cirq.measure(memory, key='m'),
        ]
    )

    return c


def simulate(circuit):
    simulator = cirq.Simulator()

    # Cases for measuring X, Y, and Z (respectively) on the memory qubit.
    params = [
        {'exponent': 0.5, 'phase_exponent': -0.5},
        {'exponent': 0.5, 'phase_exponent': 0},
        {'exponent': 0, 'phase_exponent': 0},
    ]

    results = simulator.run_sweep(circuit, params, repetitions=5000)

    for label, result in zip(('X', 'Y', 'Z'), list(results)):
        # Only select cases where the ancilla is 1.
        # TODO: optimize using amplitude amplification algorithm.
        # Github issue: https://github.com/quantumlib/Cirq/issues/2216
        expectation = 1 - 2 * np.mean(result.measurements['m'][result.measurements['a'] == 1])
        print(f'{label} = {expectation}')


def main():
    """The main program loop.

    Simulates HHL with matrix input, and outputs Pauli observables of the resulting qubit state |x>.
    Expected observables are calculated from the expected solution |x>.
    """

    # Eigendecomposition:
    #   (4.537, [-0.971555, -0.0578339+0.229643j])
    #   (0.349, [-0.236813, 0.237270-0.942137j])
    # |b> = (0.64510-0.47848j, 0.35490-0.47848j)
    # |x> = (-0.0662724-0.214548j, 0.784392-0.578192j)
    A = np.array(
        [
            [4.30213466 - 6.01593490e-08j, 0.23531802 + 9.34386156e-01j],
            [0.23531882 - 9.34388383e-01j, 0.58386534 + 6.01593489e-08j],
        ]
    )

    t = 0.358166 * math.pi
    register_size = 4
    input_prep_gates = [cirq.rx(1.276359), cirq.rz(1.276359)]
    expected = (0.144130, 0.413217, -0.899154)

    # Set C to be the smallest eigenvalue that can be represented by the
    # circuit.
    C = 2 * math.pi / (2**register_size * t)

    # Simulate circuit.
    print("Expected observable outputs:")
    print("X =", expected[0])
    print("Y =", expected[1])
    print("Z =", expected[2])
    print("Actual: ")
    simulate(hhl_circuit(A, C, t, register_size, *input_prep_gates))


if __name__ == '__main__':
    main()

###### *Bell State and Superdense Coding*

https://en.m.wikipedia.org/wiki/Bell_state

###### *Bernstein-Vazirani Algorithm*

https://www.youtube.com/watch?v=sqJIpHYl7oo

###### *Hidden Subgroup Problem*

**Hidden subgroup problem**

The dihedral hidden subgroup problem https://arxiv.org/abs/2106.09907

https://en.m.wikipedia.org/wiki/Hidden_subgroup_problem

[Examples of HSP over non-Abelian groups](https://www.youtube.com/watch?v=D-pfWjGqENg)

* We describe the HSP over symmetric and dihedral groups and their interesting applications.

* example 2 graphs A and B: it's not sufficient if they have same number of edges and vertices

* it's also important to know if they have the same connectivity (ismorphic): edge 5 and 10 in graph A are connected as well as edge 5 and 10 in graph B

* compelixity of graph ismorphism is NP: if you have the solution, it's easy to check if it's true.

* Graph automorpphism: you relabel the graph and it looks the same (non trivial besides identity map)


[Hidden Subgroup Problem](https://en.m.wikipedia.org/wiki/Hidden_subgroup_problem) (HSP) generalizes many problems on which quantum computers offer a potential exponential speed-up over classical computers and derives its significance from two important and related questions. First, the study of the problem sheds some light on what it is that quantum computers are really good at. Second, the ability to solve HSP for non-Abelian groups in polynomial time would enable us to extend the set of problems on which quantum computers offer substantial performance advantage. In this introductory talk, I will recap essential concepts from group theory before defining the Hidden Subgroup Problem and describing the quantum algorithm that solves it quickly. I will end by showing how to express a number of well-known problems, such as Simon's problem, discrete logarithm and graph isomorphism, as instances of HSP.


The Hidden Subgroup Problem (HSP) is a computational problem in the field of quantum computing and cryptography. It is a generalization of the well-known problem called the Discrete Logarithm Problem (DLP) and the Integer Factorization Problem (IFP), which are both important in modern cryptography.

In the Hidden Subgroup Problem, the input is a group G and a function f: G -> X, where G is a finite group and X is a finite set. The goal is to find a subgroup H of G such that f(x) is constant for all elements x in the same coset of H. In simpler terms, the problem involves finding a hidden structure within a group based on the information provided by the function.

The Hidden Subgroup Problem has important implications in cryptography because certain mathematical problems, such as factoring large numbers, can be reduced to it. If an efficient quantum algorithm were developed to solve the Hidden Subgroup Problem, it would have significant consequences for public-key cryptosystems like RSA and elliptic curve cryptography, which rely on the difficulty of factoring large numbers or solving the DLP.

Efforts have been made to find efficient quantum algorithms for solving the Hidden Subgroup Problem, particularly using quantum Fourier transform-based algorithms like Shor's algorithm. However, no general efficient quantum algorithm has been discovered yet for arbitrary groups, although there are some specific cases where efficient algorithms have been developed.

The study of the Hidden Subgroup Problem and its related algorithms is an active area of research in quantum computing and cryptography, with the aim of better understanding the limits and possibilities of quantum computing's impact on cryptographic systems.

What is the challenge in the hidden subgroup problem?

The main challenge in the Hidden Subgroup Problem (HSP) is to efficiently find the hidden subgroup within a given group based on the information provided by a function. The difficulty lies in identifying the underlying structure or pattern that the function reveals about the subgroup.

One of the key aspects that makes the HSP challenging is that the function only provides partial information about the subgroup. The function maps elements of the group to a set of values, but it does not directly reveal the subgroup itself. Instead, the function is designed in such a way that it distinguishes elements of different cosets of the hidden subgroup while keeping elements within the same coset indistinguishable. This limited information makes it difficult to directly extract the subgroup.

Another challenge is that the HSP encompasses a wide range of group structures. While some specific cases of the problem have been efficiently solved using quantum algorithms, finding a general solution that works for arbitrary groups remains an open question. The difficulty varies depending on the properties and structure of the group involved.

Additionally, the HSP is of particular interest in the field of quantum computing because it has the potential to break certain cryptographic protocols. Efficiently solving the HSP for specific groups could lead to the development of quantum algorithms that can break widely used encryption schemes based on problems like factoring large numbers or solving the Discrete Logarithm Problem.

Despite efforts to develop quantum algorithms for the HSP, no general solution has been found yet. The challenge lies in understanding the underlying mathematical structures, devising efficient algorithms, and leveraging the power of quantum computation to solve the problem in a scalable and practical manner.

Do you have a concrete example for a hidden subgroup?


Certainly! One of the well-known examples of the Hidden Subgroup Problem involves the symmetric group, denoted as S<sub>n</sub>. The symmetric group consists of all possible permutations of n elements. In this case, the hidden subgroup problem involves finding the hidden subgroup based on the information provided by a function.

Let's consider an example with n = 3, where the elements of the symmetric group S<sub>3</sub> are permutations of three objects, let's say {1, 2, 3}. The hidden subgroup problem would involve finding a hidden subgroup within S<sub>3</sub> based on the function f: S<sub>3</sub> -> X, where X is some set.

In this scenario, the function f could be defined as follows:
f(1, 2, 3) = 'A'
f(1, 3, 2) = 'A'
f(2, 1, 3) = 'B'
f(2, 3, 1) = 'B'
f(3, 1, 2) = 'C'
f(3, 2, 1) = 'C'

Here, the function assigns labels 'A', 'B', and 'C' to certain permutations in S<sub>3</sub>. The goal of the Hidden Subgroup Problem would be to determine the hidden subgroup, which, in this case, corresponds to the permutations that are labeled with the same letter.

In this particular example, the hidden subgroup consists of the identity permutation (1, 2, 3), the permutation (2, 1, 3), and the permutation (3, 2, 1). The function f labels these permutations with the letter 'A'. So, the hidden subgroup in this case is { (1, 2, 3), (2, 1, 3), (3, 2, 1) }.

The challenge is to find the hidden subgroup based on the given function, which may involve clever algorithms and techniques to deduce the structure of the subgroup from the function's output.

What would be a potential algorithm in this case?

In the specific case of the Hidden Subgroup Problem (HSP) for the symmetric group S<sub>3</sub> with the given function f: S<sub>3</sub> -> X, there is a known efficient algorithm called the Coset Sampling Algorithm. This algorithm allows us to find the hidden subgroup by exploiting the structure of the symmetric group.

The Coset Sampling Algorithm for S<sub>3</sub> proceeds as follows:

1. Start with an initial candidate subgroup H = {identity permutation}.
2. For each element g in S<sub>3</sub> - H:
   a. Compute the coset gH.
   b. Apply the function f to each element in the coset gH and record the results.
3. Analyze the recorded results to determine if there is a consistent pattern.
   a. If the results show that all elements in a coset map to the same value, update H to include the elements of that coset.
   b. If the results do not exhibit a consistent pattern, return to step 2 with the next element in S<sub>3</sub> - H.
4. Repeat steps 2 and 3 until no new cosets are added to H.
5. The final H obtained is the hidden subgroup.

In the case of S<sub>3</sub>, the Coset Sampling Algorithm will correctly identify the hidden subgroup { (1, 2, 3), (2, 1, 3), (3, 2, 1) } based on the given function f.

It's important to note that the Coset Sampling Algorithm is specific to the symmetric group S<sub>3</sub> and may not generalize to other groups. The Hidden Subgroup Problem for other groups often requires different algorithms and techniques tailored to the specific group's structure.

Can you write this as a python function?

Certainly! Here's an example of a Python function that implements the Coset Sampling Algorithm for the Hidden Subgroup Problem in the case of the symmetric group S<sub>3</sub>:

In this example, the hidden_subgroup_algorithm_S3 function implements the Coset Sampling Algorithm for S<sub>3</sub>. The f parameter represents the given function that maps permutations to certain values. The algorithm iterates through the elements of S<sub>3</sub> and updates the candidate subgroup H based on the function's output until no new cosets are added to H. Finally, the function returns the discovered hidden subgroup.

You can run this code to find the hidden subgroup in the provided example function example_function. Feel free to modify the function example_function or provide your own function f to experiment with different scenarios.

###### *Quantum Algorithm for Nonlinear Differential Equations*

well its linear PDEs we have well established quantum algorithms for and CFD is highly nonlinear. That didn't stop airbus from pretending that quantum algorithms for non-linear PDEs existed, for like 5 years before they did, and even making a competition about it (which nobody won).

But it turns out that more recently there has been a bit of progress in developing quantum algorithms for nonlinear PDEs when you have bounded nonlinearities. I am still a bit skeptical these will end applying to the type of problems airbus is interested in but it is not totally crazy to think there might be something here for certain CFD problems. Bill on our team has recently started a project that aims to look into this.

https://www.pnas.org/doi/10.1073/pnas.2026805118

https://arxiv.org/abs/2303.16550

#### <font color="blue">**Computational Learning**

###### *Computational Learning Theory*

Boolean functions: the target is some unknown c : {0, 1}^n → {0, 1}

compute a Boolean function f :{0,1}^N →{0,1}

|0⟩ ⊗ |1⟩ ∈ C

For b ∈ {0,1}^k , write: |b_1⟩ ⊗···⊗ |b ⟩_k

*Online Machine Learning*

https://en.m.wikipedia.org/wiki/Online_machine_learning


*Stability (learning theory)*

https://en.m.wikipedia.org/wiki/Stability_(learning_theory)

> [Algorithmic information theory](https://en.m.wikipedia.org/wiki/Algorithmic_information_theory)



https://en.m.wikipedia.org/wiki/Quantum_information

Paper: Algorithmic information theory https://arxiv.org/abs/0809.2754

Example: there are many simple functions that can be used to calculate pi, which is extremely large (or infinite).
* The Leibniz formula: This formula sums an infinite series of alternating terms, and it can be used to calculate pi to any desired accuracy.
* The Machin formula: This formula uses trigonometric functions to calculate pi, and it is more accurate than the Leibniz formula for a given number of terms.
* The Chudnovsky formula: This formula is the most accurate known formula for calculating pi, and it can be used to calculate pi to trillions of digits.

https://en.m.wikipedia.org/wiki/Algorithmic_probability

Gregory Chaitin, one of the theories founders, siehe chaitin's number

algorithmic information: It takes as its starting point an intuitive way of thinking about probability and complexity: the idea that, at least **for some outputs, it’s computationally easier to describe how to generate something than to actually generate it**.

Take the well-worn analogy of a monkey pressing keys on a computer at random. The chances of it typing out the first 15,000 digits of pi are absurdly slim — and those chances decrease exponentially as the desired number of digits grows.

But if the monkey’s keystrokes are instead interpreted as randomly written computer programs for generating pi, the odds of success, or “algorithmic probability,” improve dramatically. A code for generating the first 15,000 digits of pi in the programming language C, for instance, can be as short as 133 characters.

In other words, algorithmic information theory essentially says that the probability of producing some types of outputs is far greater when randomness operates at the level of the program describing it rather than at the level of the output itself, because that program will be short. In this way, complex structures — fractals, for instance — can be more easily produced by chance.

But a problem with that approach quickly emerged: **Mathematicians learned that the algorithmic complexity (also known as Kolmogorov complexity, after Andrey Kolmogorov, one of the theory’s founders) of a given output — the length of the shortest possible program needed to specify it — is not computable**. Computer scientists therefore couldn’t determine the ideal way to compress a string or other object.


https://www.quantamagazine.org/computer-science-and-biology-explore-algorithmic-evolution-20181129/

https://en.m.wikipedia.org/wiki/Algorithmic_information_theory

According to Gregory Chaitin, it is "the result of putting Shannon's information theory and Turing's computability theory into a cocktail shaker and shaking vigorously."

**Algorithmic information theory (AIT)** is a branch of theoretical computer science that concerns itself with the relationship between computation and information of computably generated objects (as opposed to stochastically generated), such as strings or any other data structure.

AIT was founded by the Russian-American mathematician Andrey Kolmogorov in the 1960s. Kolmogorov's main contribution was to define the notion of algorithmic complexity, which is a measure of the amount of information required to describe an object. The algorithmic complexity of an object is the length of the shortest computer program that can generate the object.

AIT has been used to study a variety of problems in theoretical computer science, including the halting problem, the compressibility of data, and the randomness of sequences. AIT has also been applied to problems in physics, biology, and philosophy.

One of the most important results in AIT is the *Church-Turing thesis*, which states that any function that can be computed by an algorithm can also be computed by a Turing machine. The Church-Turing thesis implies that algorithmic complexity is a well-defined concept, and that it is possible to compare the algorithmic complexity of different objects.

AIT is a complex and challenging field, but it has had a profound impact on our understanding of information and computation. AIT has also led to the development of new tools and techniques for studying complex systems.

Here are some of the key concepts in algorithmic information theory:

* **Algorithmic complexity:** The amount of information required to describe an object.
* **Turing machine:** A theoretical model of computation.
* **Church-Turing thesis:** Any function that can be computed by an algorithm can also be computed by a Turing machine.
* **Kolmogorov complexity:** The algorithmic complexity of an object, as defined by Andrey Kolmogorov.
* **Solomonoff complexity:** A generalization of Kolmogorov complexity.
* **Chaitin complexity:** A different generalization of Kolmogorov complexity.

Algorithmic information theory is a rapidly developing field, and there are many open problems that remain unsolved. However, AIT has already had a major impact on our understanding of information and computation, and it is likely to continue to be an important area of research in the years to come.

**Chaitin complexity**, also known as algorithmic complexity, is a measure of the amount of information required to describe an object. It is a generalization of Kolmogorov complexity, and it is defined as the length of the shortest computer program that can generate the object.

Chaitin complexity was first introduced by Gregory Chaitin in the 1960s. Chaitin showed that the halting problem, which is the problem of determining whether a given computer program will halt, is equivalent to the problem of determining the Chaitin complexity of a program. This means that it is impossible to determine the Chaitin complexity of a program without actually running the program.

Chaitin complexity has been used to study a variety of problems in theoretical computer science, including the compressibility of data, the randomness of sequences, and the halting problem. It has also been applied to problems in physics, biology, and philosophy.

Here are some of the key properties of Chaitin complexity:

* It is a non-computable function.
* It is monotonic, meaning that the Chaitin complexity of a program cannot decrease as the program is modified.
* It is universal, meaning that it can be used to measure the complexity of any object.

Chaitin complexity is a powerful tool for measuring the amount of information in an object. However, it is important to note that Chaitin complexity is not a measure of the randomness of an object. An object can have a high Chaitin complexity even if it is not random.

Here are some of the applications of Chaitin complexity:

* **Compressibility of data:** Chaitin complexity can be used to measure the compressibility of data. Data with a high Chaitin complexity is difficult to compress, while data with a low Chaitin complexity is easy to compress.
* **Randomness of sequences:** Chaitin complexity can be used to measure the randomness of sequences. Sequences with a high Chaitin complexity are more random than sequences with a low Chaitin complexity.
* **Halting problem:** Chaitin complexity can be used to study the halting problem. The halting problem is the problem of determining whether a given computer program will halt. Chaitin showed that the halting problem is equivalent to the problem of determining the Chaitin complexity of a program.

Chaitin complexity is a complex and challenging concept, but it has had a profound impact on our understanding of information and computation. Chaitin complexity has also led to the development of new tools and techniques for studying complex systems.

*Learning Theories: Computational, Statistical and Algorithmic*

[Algorithmic learning theory](https://en.m.wikipedia.org/wiki/Algorithmic_learning_theory): is a branch of Computational learning theory. It's about Efficiency: minimizing the number of data points required before convergence to a correct hypothesis.

[Statistical learning theory](https://en.wikipedia.org/wiki/Statistical_learning_theory)

[Computational learning theory](https://en.wikipedia.org/wiki/Computational_learning_theory)


https://levelup.gitconnected.com/computational-learning-theory-and-statistical-learning-theory-in-machine-learning-9dd2d8fb1aec

**What is the difference between statistical learning theory, computational learning theory and algorithmic learning theory?**

Statistical learning theory, computational learning theory, and algorithmic learning theory are all closely related fields of study that deal with the theoretical foundations of machine learning. However, they have different focuses and goals.

* **Statistical learning theory** is concerned with the development of mathematical models that enable machines to learn from data by identifying patterns and relationships. It is based on statistical principles and focuses on the estimation of probability distributions from data.
* **Computational learning theory** focuses on designing and analyzing algorithms that enable machines to learn from data with high confidence. It is concerned with the minimum amount of data required to achieve a certain level of accuracy, as well as the time and space complexity of the learning algorithms.
* **Algorithmic learning theory** is a subfield of computational learning theory that focuses on the design and analysis of specific machine learning algorithms. It is concerned with the theoretical guarantees of these algorithms, such as their correctness, efficiency, and robustness.

In general, statistical learning theory provides the theoretical foundation for machine learning, while computational learning theory studies the practical aspects of designing and implementing machine learning algorithms. Algorithmic learning theory is a more specialized field that focuses on the design and analysis of specific algorithms.

Here is a table that summarizes the key differences between the three theories:

| Theory | Focus | Goal |
|---|---|---|
| Statistical learning theory | Mathematical models | Develop models that can learn from data by identifying patterns and relationships |
| Computational learning theory | Algorithms | Design and analyze algorithms that can learn from data with high confidence |
| Algorithmic learning theory | Specific algorithms | Design and analyze specific machine learning algorithms with theoretical guarantees |

https://levelup.gitconnected.com/computational-learning-theory-and-statistical-learning-theory-in-machine-learning-9dd2d8fb1aec



siehe auch: https://en.m.wikipedia.org/wiki/Computational_complexity_theory

###### *Quantum Machine Learning*

> **Quantum Machine Learning: understand the ultimate physical limits to data analysis**

**Summary**

* Measure the performance of quantum algorithms (vs classical)
* How may times to run the algorithms
* Runtime of the algorithm
* Average error vs worse case error
* Sample complexity (how much data at min necessary?)
* Model complexity (what capacity has the model?


> *Challenges QML: Trainability (expressibility) vs Generalization (capacity) vs Barren plateaus*

**quantum-aware optimizers:**
* shot-frugal optimizers [51–54] can employ stochastic gradient descent while adapting the number of shots (or measurements)
* Quantum natural gradi- ent [55, 56] adjusts the step size according to the local geometry of the landscape (based on the quantum Fisher information metric).

**Optimal quantum sample complexity of learning algorithms** (2017)

Paper: https://arxiv.org/abs/1607.00932

* analyze quantum sample complexity, where each example is a coherent quantum state

* Bshouty and Jackson [BJ99]: showed that quantum examples are more powerful than classical examples in some fixed-distribution settings. But Atıcı and Servedio [AS05], improved by Zhang [Zha10], showed that in the PAC setting (where the learner has to succeed for every distribution), quantum examples cannot be much more powerful

* Our main result is that **quantum and classical sample complexity are in fact equal up to constant factors** in both the PAC and agnostic models

* quantum sample complexity:
  * For learning DNF under the uniform distribution from classical examples, the best upper bound is quasipolynomial time [Ver90].
  * With the added power of “membership queries,” where the learner can actively ask for the label of any x of his choice, DNF formulas are known to be learnable in polynomial time under uniform D [Jac97],
  * but without membership queries polynomial-time learnability is a longstanding open problem (see [DS16] for a recent hardness result).

* How many quantum examples are needed to learn a concept class C of VC dimension d?
  * Since a learner can just measure a quantum example in order to obtain a classical example, the upper bounds on classical sample complexity trivially imply the same upper bounds on quantum sample complexity.
  * But what about the lower bounds? Are there situations where quantum examples are more powerful than classical? Indeed there are.:
    * (1) learning DNF under the uniform distribution without membership queries.
    * (2) learnability of the concept class of linear functions over F2, C = {c(x) = a · x : a ∈ {0,1}^n}, again under the uniform distribution D. >> A classical learner needs about n examples to learn an unknown c ∈ C under this D. However, if we are given one quantum example then a small modification of the Bernstein-Vazirani algorithm [BV97] can recover a (and hence c) with probability 1/2. Hence O(1) quantum examples suffice to learn c exactly, with high proba- bility, under the uniform distribution.
    * But: The success probability of the Bernstein-Vazirani algorithm deteriorates sharply when D is far from uniform, but that does not rule out the existence of other quantum learners that use o(n) quantum examples and succeed for all D.
* Our main result in this paper is that **quantum examples are not actually more powerful than classical labeled examples** in the PAC model and in the agnostic model: we prove that the lower bounds on classical sample complexity of Eq. (1) and Eq. (2) hold for quantum examples as well.
* Accordingly, despite several distribution-specific speedups, **quantum examples do not significantly reduce sample complexity if we require our learner to work for all distributions D**. This should be contrasted with the situation when considering the time complexity of learning.

**Exponential separations between classical and quantum learners**

Paper: https://arxiv.org/abs/2306.16028. Old paper: [On establishing learning separations between classical and quantum machine learning with classical data (2022)](https://arxiv.org/abs/2208.06339)

* "*The hypotheses could be specified by a polynomial-sized Boolean circuit, in which case they are **classically efficiently evaluatable**. On the other hand, the hypotheses could also be specified by polynomial-depth quantum circuits, in which case they are **quantumly efficiently evaluatable**.*"

* studying separations between classical and quantum learners, we make the distinction whether the hypotheses are efficiently evaluatable classically or quantumly (=hypotheses classes)

* from [PAC learning](https://en.m.wikipedia.org/wiki/Probably_approximately_correct_learning): the learner receives samples and must select a generalization function (called the hypothesis) from a certain class of possible functions

* **Vedran: learning problems where quantum learning algorithms can achieve a provable exponential speedup**

* One of the key challenges of the quantum machine learning field is **identifying learning problems where quantum learning algorithms can achieve a provable exponential speedup over classical learning algorithms**.

* The currently known examples are all contrived, and all rely on cryptographic methods to make learning hard for a classical learner. However, the broadly shared intuition is that learning separations should in particular be most apparent when the learning task involves data from genuinely quantum sources, which are very different from cryptographic problems.

* In this talk I will discuss this apparent discrepancy and present a series of new results showing various types of advatnages that can be obtained. In particular, we show how many complex physical systems have associated learning problems which exhibit exponential separations, subject to plausible assumptions in complexity theory, giving a confirmation of the intuition.


**Key questions for the quantum machine learner to ask themselves**

https://iopscience.iop.org/article/10.1088/1367-2630/abac39/meta

Motivation: In particular, it is still not clear what QML even means. It is further not clear how much quantitative improvement quantum models can offer for classical datasets and what fundamental limitations are imposed on QML from quantum information theory and quantum query complexity.



**1. What do I meanby quantum machine learning?**
* for classical it's somewhat coherent: study the behavior of algorithms whose performance is derived from past experiences which are fed to it in the form of a training data set
* **Answer 1: QML involves using a quantum device to solve a machine learning task with greater speed or accuracy than its classical analogue would allow**.
  * the least ambitious definition of what is meant by QML, answer 1 captures much of the flavor of QML, it also falls short in important ways: such answers presuppose that a unique classical analogue exists.
  * **computer scientist perspective: reduce sample, query and time complexities**, e.g. quantum PAC learning in which quantum and classical sample complexities are found to be equivalent but time-complexity is not
  * **data scientist perspective: empirical performance of quantum classifiers (optimizing a test loss function)**. goal for quantum computer is to provide a richer class of models for the data, meaning that by incorporating quantum effects one can model correlations in a data set that would otherwise be difficult to describe using traditional models
* **Answer 2: QML involves using a quantum device to classify or extract features from quantum states**.
  * challenges that appear in finding an exponential speedup on quantum computers for classically important machine learning tasks, tend to focus on problems where the data itself is hard to generate or store classically, e.g. quantum neural networks to identify quantum phase transitions
  * find quantum advantages in either sample or time complexity: difficult to compare the costs of quantum and classical machine learning problems in both cases. In particular, if the quantum states that define the training set were specified as bit strings then the classical analogue would require exponential input.
  *  If one has access to a unitary process for preparing the input quantum states then this affords the algorithm possibilities that sometimes cannot be granted by models for QML where the training data is provided through a non-invertible procedure
  * strong connections between learning algorithms that take as input quantum data and quantum characterization verification and validation. Protocols such as quantum Hamiltonian learning, shadow tomography and other protocols adjacent to these fields can also be viewed as QML


2. **Question 2: what is the classical analogue of my quantummachine learning algorithm?**
* exponential speedups are often not justified using complexity theoretic, query complexity arguments or even the inability for the community to provide an efficient classical algorithm for the problem (as in the case of factoring); rather, they were justified by comparison to a classical algorithm that was deemed to be analogous to the quantum algorithm.
* The central problem with such a comparison is that there often are multiple classical analogues of a quantum algorithm. In all these cases, the claims of exponential speedups evaporated when the correct analogue of the classical algorithms is compared against.
* The struggle is very much akin to **understanding the classical limit of quantum mechanic**. There are two main correspondences that often are used to understand the classical limit of quantum mechanics:
  * the Ehrenfest correspondence states that a single Newtonian trajectory should emerge from the correct classical limit of quantum dynamics.
  * The Liouville correspondence states instead that the classical limit of quantum dynamics occurs when the quantum predictions are in practically indistinguishable from those of a classical probability distribution. This exact same issue arises in QML.
* **Question 2.1: what is my input model?**
  * classical input models: database model (in memory) and streaming model
  * there are also multiple ways that the data can be represented: bit encoding (coherent and incoherent bit-oracle), amplitude-encoding, quantum input model..
* **Question 2.2: what does my algorithm output?**
* e.g. class label or potentially an estimate for a gradient for the weights (i.e. the parameters) of the model
* For other quantum algorithms, the output is often taken to be a quantum state (when the QML algorithm to be a subroutine in a larger algorithm since there are no clear classical outputs for such algorithms.)
* "output problem": several of the most significant quantum algorithms in the field suffer from the problem of extracting classically meaningful data from the quantum output. cost of extracting desired information from the solution dominates the cost of a quantum algorithm.
* Answer 1: I am using a bit encoding so standard classical algorithms are a natural analogue to my quantum algorithm
* Answer 2: I am using an amplitude encoding so randomized classical ML algorithms are a natural analogue of my quantum algorithm.
* Answer 3: I am learning directly from quantum states and so classical analogues of my algorithm may exist in the quantum tomography literature.


*Nota bene: coherent and incoherent encoding*

A vector of real numbers can be encoded into the amplitudes of a quantum state such that each number in the vector corresponds to the amplitude of a basis state. This is a compact and efficient way to represent classical data in a quantum system, and it forms the basis for many quantum machine learning algorithms.

Incoherent amplitude encoding refers to a specific kind of amplitude encoding where the quantum states are diagonal in the computational basis. This means that each basis state (e.g., |0>, |1>, |2>, etc. for a qubit system) is associated with a certain probability, rather than a complex amplitude.

It is called "incoherent" because there is no phase relation or interference between the different basis states, unlike in the general case of quantum superposition. Incoherent amplitude encoding is less powerful than the general (coherent) amplitude encoding, but it may be easier to implement in practice because it does not require maintaining phase coherence between different parts of the system.

3. **Question 3: how well does my QML algorithm compare to corresponding classical approaches?**
* This question is perhaps the most important, but hardest to address question in QML. you can benchmark the performance of a protocol: value of a loss function, time complexity of an algorithm or number of samples
* **Answer 1: my QML algorithm, which uses amplitude encoding, may be exponentially worse than classical algorithms that use a bit encoding.**
* **Answer 2: I really do not know, but I have uncompelling numerical/experimental results that may show advantages for an objective function I chose on a few small data sets**
  * many of the features that make empirical approaches so successful for classical ML are absent from QML, less emphasis is being spent looking at QML from the perspective of a computer science and more emphasis is taken on examining QML from a data science perspective.
  * there are no Quantum data sets have also yet to go through the standardization process that allowed the empirical performance of ML algorithms. most comparisons are done on very small synthetic data sets that are easily scalable
* **Answer 3: I have not done a solid empirical comparison but I see polynomial advantages in either query/sample/time complexity or have complexity theoretic arguments for why my scheme cannot be efficiently simulated classically.**
  * A nice side effect of trying to answer the question in this fashion forces you to think about what metrics you will be using to compare your quantum algorithm to its classical analogue.


4. **What fundamental limitations emerge when trying to build QML protocols?**
* Answer 1: there are complexity theoretic reasons why I cannot solve this QML task in polynomial time
  * Many problems in machine learning can be recast as optimization problems that are **NP-hard**. K-means clustering is an excellent example of such a protocol, and as a result we strongly suspect that quantum computers will be unable to provide exponential speedups for such problems.
  * On the other hand, other protocols such as training quantum Boltzmann machines can reduce to problems that are **QMA-hard**.
  * Other tasks such as preparing Gibbs states using quantum rejection sampling involves implicitly estimating partition functions. This task is **#P-hard**, which means that efficient exact algorithms for these protocols are highly unlikely to exist.
  * lack of standardized metrics to compare
* Answer 2: there are information theoretic limitations that impact the performance of my algorithm.
  * complexity theoretic arguments vs information theoretic arguments (bounds) for limitations on QML protocols
* Answer 3: there are arguments from random matrix theory that training my model will be generically hard
  * with high probability the gradients of training objective functions in large unstructured quantum circuits are concentrated exponentially closely to zero.
  * Such problems it should be noted do not usually arise in classical bit encodings because of the presence of back propagation, which can rely on double precision to apply the chain rule implicitly and compute gradients of the training objective function
  * development of quantum backpropagation algorithms does not address this issue in most cases because of the precision requirements on the outputs. For these reasons, it is my belief that older approaches to the gradient decay problem, such as generative pre-training, may play a larger role in QML than it presently plays in classical machine learning.



**On quantum backpropagation, information reuse, and cheating measurement collapse**

https://arxiv.org/abs/2305.13362

Video: [Amira Abbas: On quantum backpropagation, information reuse and cheating measurement collapse](https://youtu.be/a6KNC6KQzCw)

* there is connection between problem of estimating gradients and problem of shadow tomography

![sciences](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_1625.jpg)


1. **Introduction**
* computing the gradient of a neural network function with respect to all its parameters, can be done at a total cost roughly proportional to running the function, instead of incurring an additional factor proportional to the number of parameters.
* backpropagation in classical circuits remains exponentially more efficient with respect to the num- ber of parameters, than the best known algorithms for determining gradients of parameterized quantum circuits
* We investigate learning algorithms with memory (multiple copies) and without quantum memory
* closely related probabilistic classical analogues can exhibit backpropagation scaling, which points out that the barrier in the quantum setting is due to quantum phenom- ena.
* leverages information reuse with gentle measurement combined with online learning, a technique that has proven useful in problems like shadow tomography as it aims to conserve quantum resources
* Unfortunately, the true computational efficiency of our scheme remains an open question
2. **Backpropagation scaling**
3. **Variational quantum models**
* present a slightly restricted model for ease of analysis which still covers a very broad range of practical scenarios.
* The problem is that intermediate information in a quantum circuit is not easily retrievable without consequence, which is investigated in the following section.
4. **Learning algorithms without quantum memory**
* learning algorithm without quantum memory would perform operations and measurements on each individual copy of a quantum state
* no quantum memory = backpropagation scaling is not possible in the general case.
* when considering variational models, a different query model must be applied and the original single-query advantage (from single black-box query) becomes unattainable.
* sometimes argued that the simul- taneous perturbation stochastic approximation (SPSA) algorithm is computationally efficient since it requires two function evaluations to estimate the gradient, irrespective of M, however, as M increases, the variance of the gradient estimate increases. so you need more measurements of more samples, which increase cost.
* other sampling schemes constructed to estimate the gradient of F(θ), like the parameter-shift rule, perform destructive measurements that typically only retrieve a partial amount of information for one component of the gradient
* reducing the infinity norm error in the gradient with reasonable probability, has a cost that scales in a way that it does not achieve backpropagation scaling.
* While this quadratic dependence on the number of parameters may not seem problematic, a
linear dependence was the necessary catalyst in the age of modern deep learning - wait up to a day to evaluate a single gradient estimate of a model with fewer than 10 000 parameters.
* perhaps neural networks are not a fair benchmark. direct classical analogue for these parameterized quantum circuits attains backpropagation scaling: Parameterized Markov chains, which are much closer classical analogues to variational models than neural networks, exhibit backpropagation scaling.
* proof and scaling comparison in Appendix B by drawing an analogy between quantum and classical probabilistic states. The formulation of this classical-quantum analogy allows us to probe the root cause of why backpropagation scaling is so difficult to obtain in the quantum variational setting: The origin of the challenge lies within quantum measurement collapse and the inability to read out intermediate states while continuing a computation, rather than the probabilistic formulation of the problem.
* In the classical setting, one is always promised to be in a computational basis state, making it possible to do perfect measurements non-destructively at intermediate steps
* this leads one to wonder whether backpropagation scaling is possible when one has access to multiple copies - perhaps there is some middle ground where one could perform measurements that are only partially destructive on multiple copies. This idea has led to breakthroughs in the shadow tomography problem
5. **Reusing multiple copies through gentle measurement**
* By allowing access to multiple copies of ρ, it is especially interesting to note that gentle mea- surements can facilitate backpropagation scaling in all resources
* By restricting to Pauli operators, one may estimate the magnitude of all gradient com- ponents with O(logM/ε4) copies of ρ via application of V and two-copy Bell measurements of the resulting state that harness gentleness
* Similarly, using O(log(M)/ε2) copies along with the magnitude information from the Bell measurements, one may estimate the sign of the gradient com- ponents using a majority vote scheme that also exploits gentleness
* This result indicates that there are at least some choices of circuits and generators for which backpropagation scaling can be achieved using gentle measurements
* one to ask if this may be possible in more general cases with techniques like shadow tomography [22], however, with just a small perturbation away from this special case, the same technique no longer works, and the general computational efficiency remains unknown
5.1. A quantum-efficient protocol for backpropagation
* Our main contribution in this more general quantum data setting with multi-copy access, is the establishment of a connection between gradient estimation and shadow tomography. This gives an exponential improvement to the sample complexity of the input state
* It also gives a quadratic improvement in the number of quantum oper- ations
* large caveat: it requires the classical storage and manipulation of a hypothesis state, which results in an exponential classical overhead
* Nevertheless, the exponential saving in sample complexity could be important in settings where the labelled quantum states coming from Nature are limited, and valuable
* In [30 - quantum advantage in learning from experiments] for example, there were sources of quantum data that, when limited in quantity, could achieve a substantial data advantage over classical learners – even in the range of 20-40 qubits
5.2 Reduction to shadow tomography
* We now show that a fully efficient algorithm for computing gradients would give rise to a fully efficient shadow tomography procedure for observables which can be efficiently implemented.
* This very general class of observables, however, is not known to have a computationally efficient shadow tomography protocol.
* But, in general, it is not trivial to remove the exponential classical cost when it comes to shadow tomography.
* as there is no known classically efficient procedure for shadow tomography with respect to poly-time observables, this reduction illustrates the difficulty of replicating true backpropagation scaling in general.
5.3 A fully gentle gradient strategy
* Shadow tomography makes use of multiple copies and a hypothesis state model, often stored classically, to require a minimal number of destructive measurements.
* It is useful to examine the limits of gentle measurement alone for gradient estimation in order to reduce the classical overhead.
* While we do not explicitly construct a protocol here, this capability would naturally lead to a scheme for gradient estimation that achieves backpropagation scaling.
* Performing this procedure using standard, destructive, mea-
surements of the POVMs would require a fresh set of oracle queries with each round. However, using sufficiently gentle measurements removes this requirement. If the distance between the pre- and post-measurement states is sufficiently small, one obtains results that are close to those that one would obtain from a fresh copy of the state.
* we demand that the state obtained after any number of
measurement rounds be within 2^(−n)/3 in the trace distance of the state prior to any measurements
* In Appendix E.4, we discuss how sufficiently gentle measurements lead to a violation of known bounds when considering a different notion of time complexity that combines measurements and oracle queries.
5.4 Approximate schemes
* key challenge in the general application of the proposed shadow tomography algorithm is the use of an explicit classical representation of the quantum state, which in general, scales exponentially with system size.
* While there have been a few special cases found that have fully efficient schemes, like with Pauli operator
* However, an exact scheme may not be required in practice, especially when dealing with noisy data. This raises the possibility of using approximate classical representations of the state.
* For example, it is known that in cases where states exhibit low entanglement, they may be efficiently represented by matrix product or tensor network states
* Moreover, in the case of shadow tomography, one is not explicitly seeking an exact representation of the density matrix, but rather a proxy, capable of reproducing the desired observables with high probability
6. **Discussion**
* inadequacy of current gradient methods to provide backpropagation scal- ing in parameterized quantum models
* If the difficulty to achieve an efficient scaling is due to inherently quantum properties, perhaps backpropagation is not the correct method for optimization of quantum models, which seems to be a growing belief for classical models too, albeit for completely different reasons

* **Data classical and algorithm classical**:
  * ***No quantum advantage for average prediction error (on quantum data) and batch size (on classical data)***
  * using classical methods to understand quantum physics for classical machine learning
  * using tensor networks (developed to understand and calculate with quantum states for neural networks)
  * methods of using neural networks to represent quantum states
* **Data classical and algorithm quantum**:
  * ***Speedup exponentially exists for streaming tasks*** https://arxiv.org/abs/2106.04633
  * how to use quantum computers to do machine learning.
  * the most common understanding of quantum machine learning
* **Data quantum and algorithm classical**:
  * ***Speedup exists to learn arbitrary quantum states classically*** (https://arxiv.org/abs/2210.14894)
  * data from quantum experiments
  * quantum error correction with machine learning
  * interpret or shortcut [quantum state tomography](https://en.m.wikipedia.org/wiki/Quantum_tomography) with machine learning
* **Data quantum and algorithm quantum**:
  * ***Speedup exists for worse case predictions (not average prediction) error*** https://arxiv.org/abs/2101.02464
  * ***Exponentially fewer data necessary*** https://arxiv.org/abs/2112.00778
  * we get quantum data, but not in terms of measurements but actually the  states of quantum data
  * you get a cryptography experiment and you get your photons out. and now you can send those photons into a photonic quantum computer and can process them into a quantum machine learning on the quantum data


**Out-of-distribution generalization for learning quantum dynamics**

https://www.nature.com/articles/s41467-023-39381-w

**Provably efficient machine learning for quantum many-body problems**

* Paper: https://arxiv.org/abs/2106.12627

* What kinds of problems are learnable from a little data?

* for interesting problems like learning ground states, in some cases a few pieces of data are sufficient to make interesting predictions about  ground state properties. - Is the fate of quantum computers to provide training data for classical models?


**Challenges and Opportunities in Quantum Machine Learning**

* Paper: https://www.nature.com/articles/s43588-022-00311-3

* Paper: https://arxiv.org/abs/2303.09491v1

**Information-theoretic bounds on quantum advantage in machine learning**

* https://arxiv.org/abs/2101.02464

* quantum computers interfacing with the quantum parts of our world. you can't subsitute that classical computation in the same way that you can perhaps do in other places.


**Learning to predict arbitrary quantum processes**

Paper: https://arxiv.org/abs/2210.14894

Video: https://www.youtube.com/live/7YiZrmXcpOE?feature=share

**On establishing learning separations between classical and quantum machine learning with classical data**

* Paper: https://arxiv.org/abs/2208.06339

**Power of Data in Quantum Machine Learning**

Paper: https://arxiv.org/abs/2011.01938

Video: https://youtu.be/eSvib4kXkIk

**Quantum advantage in learning from experiments**

Paper: https://arxiv.org/abs/2112.00778

Paper: https://ai.googleblog.com/2022/06/quantum-advantage-in-learning-from.html

Video 1: [Quantum advantage in learning from experiments](https://www.youtube.com/watch?v=HY7IhKN03Vk&list=WL&index=3)

Video 2: https://youtu.be/unxgUmrk6Jo

When can a quantum computer be replaced by a dataset?

**Improved bounds on quantum learning algorithms** (PAC)

Paper: https://arxiv.org/abs/quant-ph/0411140



**Efficient Tomography of Non-Interacting Fermion States**

Paper: https://arxiv.org/abs/2102.10458

**The learnability of quantum states** (PAC)

Paper: https://arxiv.org/abs/quant-ph/0608142

***Challenge 2: Barren Plateaus***

https://www.nature.com/articles/s41467-018-07090-4

Classical methods to overcome vanishing gradient:

* Better initialization
* ResNet networks with skip connections
* ReLU activation function

https://pennylane.ai/qml/demos/tutorial_barren_plateaus.html

**Algebraic solution to solve barren Plateaus**

Lie Algebra: tells me where do I get when I start at a given state

How can we measure if a Barren plateau (overparametrization) will occur before running the quantum neural network? - With Lie algebra! - Overparamerization (too much capacity) arises when the quantum Fischer information matrices (QFIM) simultaneously saturate their achievable rank. More parameter aren’t needed anymore. Link to Lie algebra: And the maximum rank of each QFIM is upper bounded by the dimension of the Lie algebra g.


From: [QHack 2022: Marco Cerezo —Barren plateaus and overparametrization in quantum neural networks](https://www.youtube.com/watch?v=rErONNdHbjg)

![geometry](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_1401.png)

Exponentiate Lie algebra to get lie groups:

![geometry](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_1400.png)

![geometry](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_1402.png)

**Use unbounded objective function**

* Other barren plateaus also don't apply for unbounded objective function. Almost all of QML uses bounded operators.
* KL divergence in classical, would be quantum relativ entropy, but that's too hard to compute. better: Maximal Quantum Rényi Divergence
* compute with Extended swap test (Generalizes swap test and Hadamard test)
* Learning thermal states: Generative algorithm to thermal state
learning,Access to LCU decomposition of the Hamiltonian

Video: [Maria Kieferova - Training quantum neural networks with an unbounded loss function - IPAM at UCLA](https://www.youtube.com/watch?v=01xvtDu94jM&list=WL&index=4&t=352s)

Abstract: Quantum neural networks (QNNs) are a framework for creating quantum algorithms that promises to combine the speedups of quantum computation with the widespread successes of machine learning. A major challenge in QNN development is a concentration of measure phenomenon known as a barren plateau that leads to exponentially small gradients for a range of QNNs models. In this work, we examine the assumptions that give rise to barren plateaus and show that an unbounded loss function can circumvent the existing no-go results. We propose a training algorithm that minimizes the maximal Renyi divergence of order two and present techniques for gradient computation. We compute the closed form of the gradients for Unitary QNNs and Quantum Boltzmann Machines and provide sufficient conditions for the absence of barren plateaus in these models. We demonstrate our approach in two use cases: thermal state learning and Hamiltonian learning. In our numerical experiments, we observed rapid convergence of our training loss function and frequently archived a 99% average fidelity in fewer than 100 epochs.


> you can achieve efficient time complexity (cost model for time is polylog), but still exponential space (memory) is needed

Quantum Backpropagation: which optimizer? neural nets vs kernel vs others?

In collaboration with researchers at Google and Caltech, we have released a new research paper demonstrating a core problem in current quantum machine learning proposals: optimization using gradients will NOT scale as efficiently as deep neural networks equipped with backpropagation.

These results illustrate that, unless better quantum models or optimization methods are found, quantum machine learning and variational approaches in general, have a large hurdle to overcome in order to practically compete with state-of-the-art deep learning models.

We propose a way forward by reducing the task of estimating gradients from shadow tomography - a well studied problem in quantum information theory. Using ideas from shadow tomography, we develop a "quantum-efficient" protocol for computing gradients, but highlight the unavoidable classical costs involved.

https://arxiv.org/abs/2305.13362

**Expectation Value and Backpropagation**

The average value (expectation value) of the measurement result is given by the
Born rule: **bold text**

> $\langle B\rangle=\left\langle\psi\left|U^{\dagger}(\theta) B U(\theta)\right| \psi\right\rangle$

Just linear algebra! Every step is a matrix-vector or matrix-matrix multiplication

Expectation values depend continuously on the gate parameters

**Backpropagating Through Quantum Circuits**

However, as long as we don't "zoom in" to what is happening in the quantum circuit, backpropagation can treat the quantum circuit as a single indivisible function

The expectation value of a quantum circuit is a differentiable function

> $
f(\theta)=\left\langle\psi\left|U^{\dagger}(\theta) B U(\theta)\right| \psi\right\rangle=\langle B\rangle$

Running on hardware and using the parameter-shift rule, we can provide both ingredients needed by backpropagation

> $
\left(\langle B\rangle, \frac{\partial}{\partial \theta}\langle B\rangle\right)
$

[Automatic Differentiation of Quantum Circuits](https://youtu.be/McgBeSVIGus)

[Variational Quantum Algorithms](https://youtu.be/YtepXvx5zdI)

[Hybrid Quantum-Classical Machine Learning](https://youtu.be/t9ytqPTij7k)

**what is backpropagation scaling?**

"Backpropagation scaling" usually refers to techniques that manage the magnitudes of gradients during the process of backpropagation, which is used to train neural networks. Proper scaling is important because it can impact the speed and effectiveness of learning.

Here are two common scaling issues that may arise during backpropagation:

1. **Vanishing gradients:** When deep neural networks are trained, gradients of the loss function can become very small. As a result, weight updates during training become insignificant, and the network takes a very long time to learn, if it can learn at all. This problem is particularly common with activation functions like the sigmoid or hyperbolic tangent, which squish a large input space into a small output range.

2. **Exploding gradients:** Conversely, gradients can also become very large, leading to large updates to the weights and causing the model to oscillate around the optimal solution, or even to diverge entirely. This is often a problem in recurrent neural networks (RNNs).

Various techniques have been proposed to mitigate these issues:

1. **Gradient clipping:** This is a common technique to prevent exploding gradients. If the norm of the gradient exceeds a certain threshold, we scale it back to prevent it from getting too large.

2. **Weight initialization:** Properly initializing the weights can prevent gradients from vanishing or exploding too quickly. For example, Xavier/Glorot and He initialization are strategies that consider the sizes of the input and output layers.

3. **Choice of activation function:** The choice of activation function can help alleviate the vanishing gradients problem. For example, ReLU (Rectified Linear Unit) and its variants (like Leaky ReLU and Parametric ReLU) are commonly used because they do not saturate for positive inputs.

4. **Batch normalization:** This technique normalizes the activations of each layer to prevent the distribution of inputs to each layer from changing too much during training, which can help mitigate both vanishing and exploding gradients.

5. **Use of optimizers:** Certain optimization algorithms like RMSProp, Adam, and Nadam can adaptively scale learning rates to mitigate both exploding and vanishing gradients.

In summary, backpropagation scaling techniques are important to ensure that the magnitude of updates during training is appropriate and to prevent issues related to vanishing or exploding gradients.

**what are proposals to ensure backpropagation scaling in quantum machine learning?**

As of my knowledge cutoff in September 2021, ensuring proper backpropagation scaling in quantum machine learning is an active area of research. A few of the techniques that have been suggested for improving gradient scaling and thus learning in quantum machine learning systems include:

1. **Variational Quantum Algorithms:** In the field of quantum machine learning, Variational Quantum Algorithms (VQAs) like the Variational Quantum Eigensolver (VQE) or Quantum Approximate Optimization Algorithm (QAOA) are commonly used. These algorithms employ a hybrid quantum-classical approach, which allows classical optimization techniques to be used in the learning process. This means classical techniques for managing gradient scaling, like batch normalization or gradient clipping, can still be applicable in this quantum setting.

2. **Parameter Shift Rule:** The parameter-shift rule is a method for computing gradients in quantum circuits, which is particularly important for variational quantum algorithms. The rule ensures that for certain types of quantum gates (those that generate rotations), the gradient of the expectation value of a quantum circuit with respect to a parameter can be computed exactly, regardless of the number of qubits or the complexity of the circuit.

3. **Randomized Layerwise Training:** Another strategy suggested for training deep quantum circuits involves training one layer at a time with a random initialization for the rest of the circuit, a technique inspired by classical machine learning strategies. This approach can help to alleviate barren plateaus -- regions in the cost function landscape where the variance of the gradients vanishes exponentially with increasing system size.

4. **Natural Gradient Descent:** There have been some initial studies into quantum natural gradient descent, which is an analogue to the classical natural gradient descent algorithm and is believed to be more robust to issues with gradient scaling.

It's important to note that the quantum machine learning field is quite young, and there is still much research to be done on these topics. Strategies that work well in practice for classical machine learning don't always have direct analogues in the quantum setting due to the fundamental differences between classical and quantum computations, and new techniques will likely need to be developed as the field progresses.

**Why can backpropagation not easily scale in quantum machine learning?**

Backpropagation, the algorithm used to optimize the parameters of a neural network in classical machine learning, relies on efficiently computing gradients of a cost function with respect to the network parameters. This is done by propagating errors backwards through the network, layer by layer, hence the name "backpropagation".

When it comes to quantum machine learning, where we deal with quantum data and quantum systems, the concept of backpropagation doesn't translate straightforwardly due to fundamental differences between classical and quantum systems.

Here are some reasons why backpropagation may not easily scale in quantum machine learning:

1. **Non-commutativity of operators**: Quantum operations, unlike classical ones, are represented by operators that generally do not commute, meaning the order in which they are applied matters. This non-commutativity makes the computation of gradients more complex than in the classical case.

2. **Measurement**: In quantum mechanics, obtaining information about the state of a system involves measurement, which is a probabilistic process that collapses the state of the system. The inherent randomness of measurement makes the direct application of backpropagation problematic.

3. **Complexity of quantum states**: Quantum states live in a complex vector space and can exist in a superposition of multiple states simultaneously, further complicating the process of backpropagation.

4. **Barren plateaus**: In the context of variational quantum algorithms, it's been found that the cost function landscape often suffers from the problem of "barren plateaus", where the function is flat almost everywhere. This makes it difficult for gradient-based methods like backpropagation to find a direction to move in to improve the function.

To address these challenges, researchers have developed quantum versions of backpropagation, such as the parameter-shift rule and the adjoint method, which are used to compute gradients of quantum circuits. However, these methods often require additional quantum resources, making it difficult to scale them up to larger quantum systems. New methods and strategies to overcome these issues are an active area of research in quantum machine learning.

What ate exactly the parameter-shift rule and the adjoint method, which are used to compute gradients of quantum circuits?

The parameter-shift rule and the adjoint method are techniques used in quantum machine learning to compute gradients of quantum circuits. Let's break down each one.

1. **Parameter-shift rule**: This rule allows the derivative of a quantum circuit output with respect to its parameters to be computed in terms of circuit evaluations. Given a parameterized quantum gate (say, a rotation gate), the derivative of the expectation value of an observable with respect to the parameter can be computed as the difference between the expectation values of the observable for two slightly different values of the parameter. These "shifted" parameter values are typically chosen to be a small positive or negative shift from the original parameter value.

    The parameter-shift rule is powerful because it allows us to compute exact gradients using only additional evaluations of the quantum circuit, with no need for complex computations involving the inner workings of the quantum operations. This makes it well-suited to hardware implementations, as the additional circuit evaluations can be run on the quantum device.

2. **Adjoint method**: The adjoint method, also known as the reverse-mode differentiation, is a technique borrowed from classical automatic differentiation, generalized to the context of quantum circuits. It involves running the quantum circuit forward, storing the state at each step, and then running a modified version of the circuit backward to calculate the derivatives.

   This method is efficient in terms of the number of quantum operations required, especially for circuits with many parameters but a single output (as is common in quantum machine learning models). However, it requires the ability to run quantum operations in reverse, as well as the ability to store quantum states, which can be challenging to implement on near-term quantum devices.

Both methods are valuable tools for training variational quantum algorithms, where we seek to optimize the parameters of a quantum circuit to minimize a certain cost function. The choice between the two methods depends on factors like the specific structure of the quantum circuit, the capabilities of the quantum hardware, and the number of parameters versus the number of outputs.

**Qhat Circuit should I use?**


* at the moment we don't know what circuit would be the best in QML, no theoretical foundation
* i.e. tensor networks
![geometry](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_1409.png)

**Hybrid Approach**

* in classical ML pipeline what you can get from a quantum computer is information about the gradients, what are partial derivatives of node/ quantum computation with respect to its parameters?

![geometry](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_1411.png)

![geometry](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_1412.png)

https://arxiv.org/abs/1909.02108 Quantum Natural Gradient



[QML Meetup: Dr Maria Schuld, Taking stock of quantum machine learning - a critical perspective](https://www.youtube.com/watch?v=8bfUMdj0-x4&t=1384s)

* Neural networks are composable, efficiently differentiable functions that transform inputs by chains of linear & non-linear operations.

* Variational circuits give rise to composable, differentiable functions that map inputs to quantum states and perform linear operations on these states >> sounds much closer to a support vector machine!

![geometry](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_1608.png)

**Maria Schuld**

- What is meant by: In neural networks the differentiation rules that we use scale linearly with the number of parameters. In variational circuits the differentiation rules that we use scale quadratically with the number of parameters.
- Automatic differentiation is based on recycling values of gradients, so that not for every parameter they have to run the whole network again, forwards and backwards.
- Challenge: Just guessing an ansatz (expressive)
- Adding more layers = Increase the frequency of the cosine kernel?? (Min 27 https://youtu.be/8bfUMdj0-x4), then just repeating these layers of encoding would be better. In many cases making an embedding and then repeating it makes the model class richer.
- Quantum advantage for learning is currently still ill-posed.
- What is meant by quantum speedup? Different concepts. It’s always relative to something.
- And what do you mean by ‘more powerful’: learning or speedup? Etc.

> How can I prove that my ansatz is classically intractable? versus **What is an ansatz design that allows gradient-descent to scale as efficient as it does when training neural networks?**

> How can I show that QML is more powerful? versus **How can I understand what QML is doing?**

https://www.youtube.com/watch?v=8bfUMdj0-x4&t=1621s

![geometry](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_1609.png)

![geometry](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_1610.png)



*Challenge 4: Expressibility vs Trainability*

> **Capacity** (=depth of functions): complex functions (can be wide or narrow), also called model power of expressivity

>**Expressibility** (=width of functions): wide array of functions (can be complex or shallow)

> **Trainability**: model is easy to train, or too complex (too wide or too deep/ complex) for a use case

https://arxiv.org/abs/2101.02138

**Expressibility & trainability**

Zoe holmes: https://youtu.be/RO3g7B0-IKA

https://stats.stackexchange.com/questions/469493/capacity-and-expressivity-of-a-neural-network

**Are expressibility and model capacity (incl. expressivity, power) related?**

Yes, expressibility and model capacity are closely related concepts in the context of machine learning, whether classical or quantum.

Model capacity refers to the complexity of the functions or relationships that a model can represent. In the context of neural networks, for example, model capacity is influenced by the architecture of the network, including the number of layers, the number of nodes in each layer, and the types of activation functions used.

Expressibility, on the other hand, often refers to a similar concept but from the perspective of the functions or states that can be achieved or represented given a specific set of parameters. For instance, the expressibility of a quantum circuit refers to the variety of quantum states or transformations that can be generated by the circuit.

In general, a model with higher capacity has higher expressibility because it can represent a wider variety of functions or states. However, having high capacity or high expressibility doesn't always lead to better performance on a specific task, because it can also make the model more prone to overfitting, harder to train, and more computationally expensive. Hence, finding the right balance between model capacity and expressibility and the complexity of the task at hand is an important part of machine learning model design and training.

**Are there cases where expressibility is high and capacity is small and vice versa, or not?**

> You can have High expressibility, small capacity (shallow neural nets), but unlikely to have Low expressibility, high capacity.


Expressibility and capacity are both measures of the complexity of functions a model can represent. They're closely related and usually increase or decrease together. However, there may be subtle differences depending on the specific definitions used, the model architecture, and the context.

1. **High expressibility, small capacity**: This could potentially occur in situations where a model has a wide range of different functions it can represent (high expressibility) but is limited in the complexity of those functions (small capacity). For example, a shallow neural network (only a few layers deep) can represent a wide variety of different functions, but it may struggle to accurately represent highly complex functions or patterns (like those present in high-dimensional data or complex tasks). This is because it lacks the depth necessary for creating intricate compositional representations.

2. **Low expressibility, high capacity**: This situation might be harder to come by, but one could imagine a scenario where a model has the potential to represent very complex functions (high capacity) but is restricted in the variety of functions it can actually express due to constraints on its parameters (low expressibility). An example could be a deep neural network with high capacity but with parameters constrained in such a way that it can only express a narrow range of functions. However, such a situation might be considered somewhat artificial.

It's important to note that the relationship between capacity and expressibility can also depend on how we define these terms and on the specific context or model architecture. Moreover, the goal in machine learning is often to find a balance between these properties to achieve good generalization: we want a model that is flexible enough to represent the complexity of the task (sufficient capacity and expressibility) but not so flexible that it overfits to the training data.

**Expressibility of circuits: Learning an unknown unitary**

* Expressibility & trainability is a tradeoff!!

* Zoe Holmes: reduce expressibility to increase trainability. Find ansatz that fits the use case problem.

* https://pennylane.ai/qml/demos/tutorial_haar_measure.html

* Barren plateaus, and hence expressibility issues, are not so much in classical ML, because vanishing cost gradients are not so much of an issue in classical ML because we don‘t have precision limitations in the same way.  You don‘t have to evauluate your cost function using many many shots, or is so resource intensive to get gradients.
* In quantum ML: you can use ideas from control theory to try to assess whether or not your ansatz is gonna be trainable or not in advance. Thats an important strategy.
* The other approach: use symmetries of your problem / you gonna have to use physics to come up with whats a good ansatz. ZB: use VGQ for some system with various (particle number conserving) translational symmetries, you want to build all of those symmetries into your ansatz and hence reduce expressibility while capturing some of the solution space.

* Maria schuld paper: how expressive are circuits? You can distribute circuits and how flexible are they? - identity gate maps to one point only. If you have a couple of more gates it maps to more points.

https://arxiv.org/abs/1905.10876: Expressibility and entangling capability of parameterized quantum circuits for hybrid quantum-classical algorithms

![geometry](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_1408.png)

Is expressivity even important? (maria schuld) https://www.youtube.com/watch?v=8bfUMdj0-x4&t=1384s

Model expressivity is different from circuit expressivity (maria schuld) https://www.youtube.com/watch?v=8bfUMdj0-x4&t=1384s

**Is model expressivity the same as model expressibility?**

The terms "expressivity" and "expressibility" are often used interchangeably in the context of machine learning, deep learning, and artificial intelligence. **Both terms refer to a model's ability to represent a range of functions or hypotheses.**

**Expressivity, more commonly used in computational fields**, describes how well a model can approximate a wide variety of functions. For example, a neural network with a higher number of layers and neurons would be considered more expressive than a network with fewer layers and neurons because it can theoretically approximate a greater variety of functions.

Similarly, "expressibility" can refer to the richness of a model's representational space or the complexity of the functions it can represent. It also implies the ability of the model to explain or represent complex relationships in the data.

However, a highly expressive or highly expressible model isn't always the best. Models that are too expressive may overfit to the training data, capturing noise and failing to generalize to unseen data. This concept is known as overfitting, and it's a common challenge in machine learning.

In conclusion, while the terms expressivity and expressibility are often used interchangeably, you should always consider the context in which they are used, as some communities or researchers may assign slightly different meanings to them. As of my knowledge cutoff in September 2021, both terms generally refer to a model's capacity to represent a wide range of functions or hypotheses.


**Are model capacity and model trainability the same?**


No, model capacity and model trainability are not the same; they are two distinct concepts in machine learning.

Model capacity refers to the complexity of the patterns a model is capable of learning from the data. It's often associated with the size or complexity of the model — a model with more parameters (like a deep neural network) has a higher capacity than one with fewer parameters (like a simple linear regression). A higher capacity model is theoretically able to learn more complex relationships in the data, but this also opens up the risk of overfitting, where the model learns the noise or specific quirks of the training data instead of the general underlying patterns.

Model trainability, on the other hand, refers to the ease with which a model can be trained. This is influenced by factors such as the design of the learning algorithm, the optimization techniques used, the presence of an appropriate loss function, the quality and relevance of the training data, and so on. A model might have high capacity (i.e., it could theoretically learn complex patterns), but if it's not easily trainable, it may not reach its potential.

So, in summary, capacity and trainability are two sides of the machine learning coin: the former about what the model could potentially learn, and the latter about how efficiently or effectively it can learn it.

**what means expressibility of a learning circuit?**

The expressibility of a learning circuit, whether classical or quantum, refers to the variety or complexity of functions, mappings, or patterns that the circuit can represent or learn given a set of parameters.

For instance, in a neural network, which can be viewed as a kind of classical learning circuit, the expressibility is related to the network's ability to approximate complex functions. A neural network with a single layer can approximate any continuous function given enough neurons (this is known as the universal approximation theorem), but it may require a very large number of neurons to do so. Adding more layers to the network (making it a deep network) can increase its expressibility, allowing it to approximate complex functions more efficiently.

Similarly, in a quantum circuit used for machine learning (a variational quantum circuit), the expressibility refers to the range of quantum states that the circuit can generate or the set of unitary transformations it can perform. The expressibility of a quantum learning circuit can affect its ability to learn complex patterns in quantum data.

In both cases, having high expressibility can be beneficial because it allows the learning circuit to represent more complex patterns. However, it can also make the circuit more difficult to train, because the optimization landscape that needs to be navigated to find the best set of parameters can become more complex and may include more local minima. In the quantum case, there is also the phenomenon of "barren plateaus" where the optimization landscape becomes flat, making it difficult for gradient-based methods to find a good solution.

**What means expressibility of a quantum circuit?**

In the context of quantum computing, the expressibility of a quantum circuit refers to the variety or range of quantum states that the circuit can generate or the set of unitary operations it can perform, given a set of parameters.

Expressibility is an important property to consider when designing quantum circuits for quantum algorithms, especially variational quantum algorithms. A more expressive quantum circuit has the potential to represent a wider range of quantum states or unitary transformations, which may make it more powerful or versatile for certain tasks.

However, there is a trade-off between expressibility and other properties of the circuit:

1. **Trainability**: While having a highly expressive quantum circuit may sound beneficial, it can actually make the circuit harder to train. This is due to the so-called "Barren Plateaus" phenomenon, where the cost function landscape that you're trying to optimize becomes flat in all directions for highly expressive circuits, making it difficult for gradient-based optimization methods to find a good solution.

2. **Noise sensitivity**: More expressive quantum circuits typically involve a greater number of quantum gates, which can make the circuit more susceptible to noise and errors in a real quantum device.

3. **Computational cost**: More expressive quantum circuits can require a larger number of qubits and quantum operations, which increases the computational cost of running and simulating the circuit.

So, while the expressibility of a quantum circuit is an important factor in quantum algorithm design, it's only one piece of the puzzle. Other factors like trainability, noise sensitivity, and computational cost also need to be taken into account.

*Challenge 5: Data Encoding (Embeddings)*

**Quantum computers are Kernel methods** of a very specific kind: https://www.youtube.com/watch?v=pe1d0RyCNxY&t=2655s

![geometry](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_1406.png)

https://arxiv.org/abs/2001.03622 - Quantum embeddings for machine learning



**Data Encoding is the most important!**

https://youtu.be/pe1d0RyCNxY?t=3008

![geometry](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_1407.png)

The Helstrom measurement is the measurement that has the minimum error probability when trying to distinguish between two states.05.09.2018

https://en.m.wikipedia.org/wiki/Quantum_state_discrimination

After we embedded / encoded our data, and if we encode it in a way that it separates one data type from another in the hilbert space, we already know which measurement is the best one to do. We know quite a bit about the measurements to distinguish data. **Maybe after encoding the data, we are already done**. We could even train the encoding!

![geometry](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_1410.png)


https://arxiv.org/abs/2001.03622 - Quantum embeddings for machine learning





https://github.com/Christophe-pere/Roadmap-to-QML/blob/main/Reviews/Reviews.md


https://www.bsi.bund.de/SharedDocs/Downloads/DE/BSI/Publikationen/Studien/QML/Quantum_Machine_Learning.pdf
https://www.nature.com/articles/s41467-022-31679-5
https://arxiv.org/abs/2208.06198

**Quantum Machine Learning: what do you mean?**

* Enhance classical or physics
* Classical vs quantum native data
* Discriminative vs generative approaches
* NISQ hybrid (annealing, variational) vs error-corrected
* Kernel-methods vs neural nets


**Open Questions in Quantum Machine Learning**
* Are quantum circuits actually useful models? (can they help generalize?)
* How can we benchmark them?
* Is there a connection between quantum theory and deep learning?
* How does noise impact applications?
* What data domains is QML good for?
* Do OML ideas scale?
* What optimization strategies work for quantum circuits?
* Which circuit architectures are good for ML?

**Problems with traditional quantum computing applied to QML**

* Check how to apply Grovers search to machine learning or hot to speed up a neural network.
* First, it talks about runtime speedups, but doesn't care about practicability (100Mio years vs forever to train is a strong speedup, but useless in practical machine learning)
* Second, too much focus on if something is provable: lots of things in ML are not provable. optimization training is np-hard. for quantum computing it's unsolvable. Most of what we do in machine learning is not provable, it just works.
* Near time quantum computing is on the other side too empirical: this and that is faster than if we run it on a small classical neural network. applied experiments on small regimes that no one cares about (a few qubits)
* third step: develop a theory about QML, taken from classical ML
> **Goal of machine learning is generaliation, not speedup** But we theoretical need to understand what's generalization, because we can't just try, QCs are too small and noisy.

IPAM: [Mathematical and Computational Challenges in Quantum Computing](http://www.ipam.ucla.edu/programs/long-programs/mathematical-and-computational-challenges-in-quantum-computing/?tab=application)

**Amira Abbas**

https://youtu.be/qcB8tfISWHo

https://youtu.be/aU8XBjG5tAw

https://youtu.be/aU8XBjG5tAw

https://youtu.be/ETxQNIR6dAg

**Zoe Holmes**

Translate priblem into a energy minimze / ground state

Cost funcrion measurement on quantum computer, because maybe easy to solve there where hard on classical

But cost landscape often complex / difficult with gradient descent

Other optimiStiln methids than gradient?

Shot noise to navigate quabtum landscapes

Quqbtum benchmarking is not a scialable roite for proving quabtum supremacy

A lot of back and forth between quantum & cösssical. But good: Help understsnd quantum systems (& quabtum inspired.

Qml in chemistry: you need chemical prexision with lots of shots

Material scienxe and xondensed matter earlier than chemistry

Qml in chemistry: less qml and mor eQuantum variational algorithms to learn electronic structure caclulations to compute ground states and excited states for molecular haniltonian. Use to compute reqction rates.

There is no collapse. Everything is unizary >> leads to many world theory

Source: https://youtu.be/zws9B66juBk


The overhead cost of the quantum variational eigensolver (VQE) and quantum phase estimation (QPE) algorithms originates out of many iterations to find the ground state, the lowest eigenvalue.

However, a trick published by Artur Izmaylov and co-authors can reduce the computational overhead by lowering the norm of the Hamiltonian.

How to increase the efficiency quantum phase estimation for quantum chemistry - lowering the norm of the electronic Hamiltonian by a new symmetry subtraction trick, check out our new submission for details :

https://arxiv.org/abs/2304.13772

###### *Concept Class, Hypothesis Class and Learner*

"*The hypotheses could be specified by a polynomial-sized Boolean circuit, in which case they are **classically efficiently evaluatable**. On the other hand, the hypotheses could also be specified by polynomial-depth quantum circuits, in which case they are **quantumly efficiently evaluatable**.*"

Source: "Exponential separations between classical and quantum learners" (2023)

Sure. Here are some examples of hypothesis classes and learning algorithms in computational learning theory:

**Hypothesis class:**

* The set of all linear classifiers.
* The set of all decision trees of a certain maximum depth.
* The set of all neural networks with a certain number of hidden layers and neurons.

**Learning algorithm:**

* The perceptron algorithm.
* The ID3 algorithm.
* The backpropagation algorithm.

The hypothesis class is the set of all possible hypotheses that the learning algorithm can consider. The learning algorithm is the procedure that the learner uses to choose a hypothesis from the hypothesis class.

In the case of the perceptron algorithm, the hypothesis class is the set of all linear classifiers. The perceptron algorithm works by iteratively updating its hypothesis until it converges to a hypothesis that correctly classifies all of the training data.

In the case of the ID3 algorithm, the hypothesis class is the set of all decision trees of a certain maximum depth. The ID3 algorithm works by recursively splitting the training data into smaller and smaller subsets until each subset is classified by a single leaf node.

In the case of the backpropagation algorithm, the hypothesis class is the set of all neural networks with a certain number of hidden layers and neurons. The backpropagation algorithm works by iteratively adjusting the weights of the neural network until it minimizes the error on the training data.

These are just a few examples of hypothesis classes and learning algorithms in computational learning theory. There are many other possible hypothesis classes and learning algorithms, and the choice of which one to use depends on the specific learning problem being solved.

* The [Concept Class](https://en.m.wikipedia.org/wiki/Concept_class) (=limited by our use case) contains the true relationships (all possible things) we want to learn = Linear and non-linear functions for a spam filter.
  * The "true" function or the actual relationship might be unknown to us. Our goal in learning is to approximate it.
  * A concept class is a set of **all possible concepts that can be learned**. A concept is a set of inputs that produce the same output. For example, the concept class is the set of all possible images of cats and dogs. The hypothesis class is the set of all possible neural networks with 3 layers. The learner is the gradient descent algorithm.

* The Hypothesis Class (=limited by our learner from space of concept classes, used to approximate a concept) (e.g. neural network or ID3 decision tree) : While the concept class denotes the true relationships, the hypothesis class is the set of functions that a learning algorithm considers when trying to produce a model from dat
  * = Set of models we consider based on our chosen learning method (set of all possible ways we can represent the things we want to learn). Set of models or functions that our chosen learning algorithm can possibly produce = only linear functions (This might miss out on some true nonlinear relationships present in the concept class).
  * goal of a learning algorithm (the learner) is to probably approximate some unknown target concept c ∈ C from random labeled examples (from "Optimal Quantum Sample Complexity of Learning Algorithms")
  * An ideal situation is when our chosen hypothesis class contains the true function (from the concept class). However, this is not always the case, which is why choosing the right type of model (and thereby the right hypothesis class) is crucial in machine learning.
  * Hypothesis Class: In classification in general, the hypothesis class is the **set of possible (classification) functions** you're considering; the **learning algorithm picks a function from the hypothesis class**. For a decision tree learner, the hypothesis class would just be the set of all possible decision trees. [Source](https://stats.stackexchange.com/questions/270324/what-is-a-hypothesis-class-in-svm#:~:text=In%20classification%20in%20general%2C%20the,of%20all%20possible%20decision%20trees.)

* The Learner or learning algorithm (e.g. gradient descent, logic regression, linear regression, decision trees and random forests) is the algorithm that produces a specific model from the hypothesis class using data - the learner is the algorithm that chooses the best hypothesis from the hypothesis class given the training data.
  * Its goal is to find a hypothesis (a specific function or model from the hypothesis class) that approximates the true function (from the concept class) as closely as possible.
  * In practical terms, the learner could be an algorithm like a decision tree, neural network, linear regression, etc. Given some data, the learner will produce a specific model, which is a member of the hypothesis class, to make predictions or decisions.
  * The goal of a learning algorithm (the learner) is to probably approximate some unknown target concept c ∈ C (concept class) from random labeled examples.

* from [PAC learning](https://en.m.wikipedia.org/wiki/Probably_approximate): the learner receives samples and must select a generalization function (called the hypothesis) from a certain class of possible functions

**Differentiate:**

https://de.wikipedia.org/wiki/Induktive_Verzerrung

https://de.wikipedia.org/wiki/Hypothesenraum

https://de.wikipedia.org/wiki/Versionsraum

https://en.m.wikipedia.org/wiki/Sample_space (Ergebnisraum)

**Computational Learning Theory**

* Statistical Learning Theory (SLT): Formal study of learning algorithms.

* Computational Learning Theory (CoLT): Formal study of learning tasks.

  * **PAC Learning (Theory of Learning Problems)**: PAC learning seeks to quantify the difficulty of a learning task and might be considered the premier sub-field of computational learning theory. Consider that in supervised learning, we are trying to approximate an unknown underlying mapping function from inputs to outputs. We don’t know what this mapping function looks like, but we suspect it exists, and we have examples of data produced by the function. PAC learning is concerned with how much computational effort is required to find a hypothesis (fit model) that is a close match for the unknown target function.

  * **VC Dimension (Theory of Learning Algorithms)**: The VC dimension estimates the capability or capacity of a classification machine learning algorithm for a specific dataset (number and dimensionality of examples). **Formally, the VC dimension is the largest number of examples from the training dataset that the space of hypothesis from the algorithm can “shatter.”** The Vapnik-Chervonenkis dimension, VC(H), of hypothesis space H defined over instance space X is the size of the largest finite subset of X shattered by H.

* https://machinelearningmastery.com/introduction-to-computational-learning-theory/


###### *Model Complexity*

Model Complexity = Capacity / Power / Expressivity

Model complexity refers to the number of parameters in a machine learning model. A model with more parameters is more complex and can potentially learn more complex functions. However, a more complex model is also more likely to overfit the training data and not generalize well to new data.


Generalization (Prediction error) depends on both the training error as well as the complexity of the trained model. The prediction error is small only if train- ing error is itself small and the complexity of the trained model is moderate (i.e., sufficiently smaller than training data size).


**Papers on Capacity Measures** [Source](https://www.youtube.com/watch?v=fDIGmkq9xNE)

* VC dimension. (Vapnik, Vladimir, and Alexey Chervonenkis. "Theory of pattern recognition." (1974).)

* Margin-based measures. Looking at the margin alone, is not sufficient (Bartlett, P., Foster, D. J., &
Telgarsky, M. (2017). Spectrally-normalized margin bounds for neural networks. arXiv preprint arXiv:1706.08498.)

* PAC-Bayes approaches (Neyshabur, B., Bhojanapalli, S., & Srebro, N. (2017). A pac-bayesian approach to spectrally-normalized margin bounds for neural networks. arXiv preprint arXiv: 1707.09564.)

* Sharpness-based measures (Keskar, N. S., Mudigere, D., Nocedal, J., Smelyanskiy, M., & Tang, P. T. P.
(2016). On large-batch training for deep learning: Generalization gap and sharp minima. arXiv preprint arXiv:1609.04836.)

* Norm-based measures (Liang, T., Poggio, T., Rakhlin, A., & Stokes, J. (2019, April). Fisher-rao metric, geometry, and complexity of neural networks. In The 22nd International Conference on Artificial Intelligence and Statistics (pp. 888-896). PMLR.)

* Y. Jiang, B. Neyshabur, H. Mobahi, D. Krishnan, and S. Bengio. Fantastic generalization measures and where to find them, 2019.

**Different measures of model capacity and their usefulness** (ED: effective dimension)

![geometry](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_1644.png)

There is a relationship between Generalization and capacity (max capacity is not necessarily what we want):


![geometry](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_1399.png)

* [Neural network capacity](https://en.m.wikipedia.org/wiki/Artificial_neural_network#Capacity): A model's "capacity" property corresponds to its ability to model any given function. It is related to the amount of information that can be stored in the network and to the notion of complexity.

* **Model capacity**: complexity of the patterns a model is capable of learning from the data. Is often associated with size or complexity of model — a model with more parameters (like a deep neural network) has a higher capacity than one with fewer parameters (like a simple linear regression).
  * A higher capacity model is theoretically able to learn more complex relationships in the data, but this also opens up the risk of overfitting, where the model learns the noise or specific quirks of the training data instead of the general underlying patterns.

* **Expressivity**: how well a model can approximate a wide variety of functions? A neural network with a higher number of layers and neurons would be considered more expressive than a network with fewer layers and neurons because it can theoretically approximate a greater variety of functions.

![geometry](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_1602.png)

*Source: [QML Meetup: Dr David Sutter (IBM Research, Zurich), The power of quantum neural networks](https://www.youtube.com/watch?v=ETxQNIR6dAg&t=684s)*

*VC Dimensions to measure power / capacity of a model*

![geometry](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_1397.png)


*Effective Dimensions to measure power / capacity of a model*

![geometry](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_1398.png)


https://www.youtube.com/watch?v=fDIGmkq9xNE&t=2067s

*Vapnik-Chervonenkis (VC) Dimension (measure of the capacity)*

![geometry](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_1643.png)

* https://www.bogotobogo.com/python/scikit-learn/scikit_machine_learning_VC_Dimension_Shatter.php

* Code example: https://www.geeksforgeeks.org/vapnik-chervonenkis-dimension/

* Video: https://youtu.be/puDzy2XmR5c?si=FTPneApRNiWQkJey

* VC dimension is a method to measure model complexity. It is a measure of the capacity of a hypothesis space, which is the set of all possible hypotheses that can be learned by a machine learning algorithm. A hypothesis space with a higher VC dimension can learn more complex functions, but it will also be more likely to overfit the training data.

* The VC Dimension can provide an upper bound on the sample complexity in terms of the size of the hypothesis class, the desired error rate, and the confidence level.

* In [Vapnik–Chervonenkis theory](https://en.m.wikipedia.org/wiki/Vapnik%E2%80%93Chervonenkis_theory), the [Vapnik–Chervonenkis (VC) dimension](v) is a **measure of the capacity (complexity, expressive power, richness, or flexibility)** of a set of functions that can be learned by a statistical binary classification algorithm.

* It is defined as the cardinality of the largest set of points that the algorithm can [shatter](https://en.m.wikipedia.org/wiki/Shattered_set), which means the algorithm can always learn a perfect classifier for any labeling of at least one configuration of those data points.

* Relationship:	The VC dimension can be used to derive bounds on the sample complexity.	The sample complexity can be used to estimate the VC dimension.

* The VC dimension of a set of functions can be used to bound the generalization error of a learning algorithm. The generalization error is the error that a learning algorithm makes on new data that it has not seen before. The VC dimension theorem states that the generalization error of a learning algorithm is bounded by the VC dimension of the hypothesis space divided by the number of training examples. In other words, **the more complex the hypothesis space, the more likely it is that the learning algorithm will overfit the training data and make errors on new data** (-that was assumed for neural nets). The VC dimension can be used to choose a hypothesis space that is not too complex, so that the learning algorithm can generalize well to new data.

**Wouldn't that mean that large neural networks would overfit very badly and never generalize well? But since neural networks do generalize very well, what is the problem with Vapnik-Chervonenkis Dimension?**

You are right that the VC dimension theorem would seem to imply that large neural networks would overfit very badly and never generalize well. However, there are a few things to keep in mind.

First, the VC dimension is a theoretical concept, and it is not always accurate in practice. The VC dimension theorem assumes that the training data is drawn from an i.i.d. (independent and identically distributed) distribution. However, in practice, the training data is often not i.i.d., and this can make the VC dimension theorem less accurate.

Second, the VC dimension is only a bound on the generalization error. It is possible for a neural network to have a large VC dimension and still generalize well, if the training data is large enough.

Finally, there are a number of techniques that can be used to prevent neural networks from overfitting, such as regularization and dropout. These techniques can help to reduce the complexity of the neural network and make it more likely to generalize well.

So, while the VC dimension is an important concept in machine learning, it is not the only factor that determines whether a neural network will overfit or generalize well. Other factors, such as the quality of the training data and the use of regularization techniques, also play a role.

Here are some additional thoughts on the limitations of the VC dimension:

* The VC dimension is only a theoretical bound, and it is not always accurate in practice.
* The VC dimension does not take into account the complexity of the target function.
* The VC dimension does not take into account the optimization algorithm used to train the neural network.

Despite these limitations, the VC dimension is still a useful concept in machine learning. It can be used to understand the complexity of a neural network and to choose a neural network that is not too complex, so that it can generalize well to new data.

*Bias-Variance Tradeoff*

* **Bias-Variance Tradeoff**: This is a key principle in statistical learning that helps to understand the tradeoff between model complexity and the risk of overfitting, which impacts the number of samples needed for learning.

*Rademacher Complexity*

* **Rademacher Complexity**: This is another measure of complexity of a function class that can be used to provide an upper bound on sample complexity. Rademacher complexity measures the ability of functions in the class to fit to random noise and is used in deriving generalization bounds.

*Covering Numbers (in sample complexity)*

* Example: *we provide bounds on the expressivity of the class of CPTP maps (or unitaries) that a quantum machine learning model (QMLM) can implement in terms of the number of trainable elements used in the architecture. As a measure of expressivity, we choose covering numbers and metric entropies w.r.t. (the metric induced by) the diamond norm.* (from: Generalization in quantum machine learning from few training data)

* measure bounds for sample complexity: Covering Numbers and Fat-Shattering Dimension: These are other measures of complexity of function classes that can be used to derive upper bounds on sample complexity.

* **Covering Numbers and Fat-Shattering Dimension**: These are other measures of complexity of function classes that can be used to derive upper bounds on sample complexity.

* [The Most Important Concept in Topology and Analysis | Compactness](https://youtu.be/td7Nz9ATyWY)

*Fat-Shattering Dimension*

* **fat-shattering dimension and VC dimension (Vapnik-Chervonenkis dimension)**

* The fat-shattering dimension is a complexity measure used in statistical learning theory. It's one way of quantifying the **complexity or expressive power of a class of functions, and it's used to derive bounds on the generalization error** of a learning algorithm.

* The formal definition of the fat-shattering dimension is a bit technical, but here's the general idea: suppose you have a class of functions, and you want to understand how complex this class is. One way of doing this is to look at how these functions can separate or "shatter" sets of points. If you can find large sets of points that can be arbitrarily labeled (i.e., "shattered") by functions in your class, then the class has high complexity.

* Now, for the fat-shattering dimension, you add an additional constraint: you require a certain margin (the "fatness") between points that are labeled differently. The largest set of points that can be shattered with a given margin is used to define the fat-shattering dimension of the function class.

* The fat-shattering dimension is related to the VC dimension (Vapnik-Chervonenkis dimension), another complexity measure that is widely used in statistical learning theory. However, **while the VC dimension only considers exact separation of points, the fat-shattering dimension takes into account this idea of a margin, which makes it more suitable for dealing with "noisy" or non-separable data.**

* the concept of the fat-shattering dimension is often used in the analysis of machine learning algorithms, particularly those based on empirical risk minimization. It can help to understand the trade-off between the expressive power of a function class (which often corresponds to the complexity of a learning algorithm) and the ability of the algorithm to generalize well to unseen data.

*Fisher Information & Cramér-Rao (Bounds on Precision of Estimator)*


Measure of redundancy. How much of model is active vs inactive

Fisher information: Sensitivity of my parameters to my model space --> measure of model capacity (?), see video from amira qhack 2022


> Fisher information is the expected value of the second derivative of the log likelihood function

**Can you explain quantum fisher information matrix in more detail please?**

Quantum Fisher Information Matrix (QFIM) is an extension of the concept of Fisher Information Matrix from classical statistics to quantum mechanics. It measures the amount of information that a quantum state carries about an unknown parameter. This parameter might be related to some aspect of the quantum state or a transformation applied to it.

The Fisher Information, in the classical sense, is a measure of how much information an observable random variable carries about an unknown parameter of the distribution. In the quantum case, the Quantum Fisher Information (QFI) is a measure of how much information a quantum state carries about an unknown parameter.

Given a parameterized quantum state ρ(θ), where θ is the parameter vector we are interested in, the QFIM is defined in terms of the symmetric logarithmic derivative (SLD), which is a Hermitian operator L(θ) that satisfies a particular equation related to the derivative of the state ρ(θ).

The entries of the QFIM, denoted as H(θ), are given by:

H_ij(θ) = 1/2 Tr [ρ(θ) {L_i(θ), L_j(θ)}]

where L_i and L_j are the SLDs corresponding to the parameters θ_i and θ_j, respectively, and { , } denotes the anticommutator.

Like the classical Fisher Information Matrix, the QFIM can be used to define a lower bound on the variance of an unbiased estimator for the parameters θ. This is the Quantum Cramér-Rao Bound.

In practical terms, the QFIM and Quantum Cramér-Rao Bound are used in quantum metrology to quantify the ultimate limit to precision that can be achieved in estimating parameters, such as phase shifts or magnetic fields, based on quantum mechanical measurements.


**Does Fisher information belong to sample complexity or more to model complexity?**

Fisher information belongs to **model complexity**. It is a measure of how much information the data provides about the parameters of a model. A higher Fisher information means that the data is more informative about the parameters, and therefore the uncertainty in the parameter estimates is lower.

The sample complexity, on the other hand, is the number of samples needed to achieve a certain level of accuracy in parameter estimation. The sample complexity depends on the Fisher information, as well as other factors such as the noise level in the data.

In other words, the Fisher information is a property of the model, while the sample complexity is a property of the data.

Here is an analogy that might help to understand the difference between Fisher information and sample complexity. Imagine that you are trying to estimate the height of a tree. The Fisher information would be like the number of branches on the tree. A tree with more branches would provide more information about its height, and therefore the uncertainty in your estimate would be lower. The sample complexity would be like the number of leaves on the tree. The more leaves on the tree, the easier it would be to measure its height, and therefore the fewer samples you would need.

I hope this helps! Let me know if you have any other questions.

**Does Cramér-Rao bound belong to sample complexity or more to model complexity?**

The Cramér-Rao bound (CRB) belongs to **model complexity**. It is a lower bound on the variance of any unbiased estimator of a parameter, given a fixed model. The CRB depends on the Fisher information, which is a measure of how much information the data provides about the parameter. A higher Fisher information means that the data is more informative about the parameter, and therefore the CRB is lower.

The sample complexity, on the other hand, is the number of samples needed to achieve a certain level of accuracy in parameter estimation. The sample complexity depends on the CRB, as well as other factors such as the noise level in the data.

In general, the CRB is a more fundamental concept than the sample complexity. It is used to understand the fundamental limits of parameter estimation, regardless of the amount of data available. The sample complexity, on the other hand, is more practical, as it tells us how much data we need to achieve a certain level of accuracy.

Here is a table summarizing the key differences between the CRB and the sample complexity:

| Feature | Cramér-Rao bound | Sample complexity |
|---|---|---|
| What does it measure? | Lower bound on the variance of any unbiased estimator | Number of samples needed to achieve a certain level of accuracy |
| Depends on | Fisher information | Fisher information, noise level, and other factors |
| More fundamental | Yes | No |
| More practical | No | Yes |

I hope this helps! Let me know if you have any other questions.

**what is QUANTUM CRAMER-RAO?**

The Cramér-Rao bound is a concept from statistics that provides a lower bound on the variance of an unbiased estimator of a parameter. In other words, it gives a measure of the best possible precision that can be achieved when estimating that parameter from a given set of data.

The quantum Cramér-Rao bound extends this concept to the realm of quantum mechanics. In a quantum mechanical context, the "parameters" to be estimated might be properties of a quantum state, such as its mean or variance. The "data" used to make these estimates come from the results of measurements performed on the quantum state.

The quantum Cramér-Rao bound then provides a limit on the precision with which these properties can be estimated, given the inherent uncertainties of quantum mechanics. This can be particularly useful in quantum metrology, where the goal is to measure physical parameters as precisely as possible using quantum systems.

The quantum Cramér-Rao bound is given by the inverse of the **Fisher information matrix**, which in the quantum case is calculated using the quantum state and the POVM (Positive Operator-Valued Measure) elements describing the measurements. It can provide insights into the optimal measurement strategies for estimating the parameters of a quantum state, and can help guide the design of quantum sensors and other quantum technologies.

**How exactly can you use quantum cramer-rao bound for optimal strategies in quantum state estimation?**

The quantum Cramér-Rao bound (QCRB) is a fundamental result in quantum estimation theory that provides a lower bound on the covariance matrix of any unbiased quantum estimator. It characterizes the best possible precision that can be achieved in estimating a parameter of interest from quantum measurements.

To use the quantum Cramér-Rao bound for optimal strategies in quantum state estimation, you can follow these steps:

1. Define the parameter of interest: Identify the parameter that you want to estimate. This parameter could be associated with a quantum state, such as its purity, coherence, or parameters describing a quantum gate.

2. Choose an estimation strategy: Determine the measurement scheme or experiment that you will use to estimate the parameter. This involves selecting a set of measurement operators or observables that will be performed on the quantum state to extract information about the parameter.

3. Compute the Fisher information matrix: Calculate the Fisher information matrix (FIM) associated with the chosen measurement strategy. The FIM quantifies the amount of information the measurements provide about the parameter and is related to the precision of the estimation.

4. Evaluate the quantum Cramér-Rao bound: Use the FIM to compute the quantum Cramér-Rao bound. The QCRB sets a lower bound on the covariance matrix of any unbiased estimator of the parameter. The covariance matrix determines the uncertainty or precision of the estimated parameter.

5. Optimize the estimation strategy: The quantum Cramér-Rao bound provides a benchmark for the best achievable precision. To find an optimal estimation strategy, you can vary the choice of measurement operators, measurement settings, or other parameters in the experiment to minimize the QCRB or approach its theoretical limit.

6. Assess the estimation performance: Compare the obtained precision with the quantum Cramér-Rao bound to evaluate the performance of your estimation strategy. If the achieved precision approaches the QCRB, it indicates that your strategy is close to the optimal precision allowed by quantum mechanics.

It's important to note that the QCRB assumes ideal measurement conditions and does not take into account practical limitations, such as noise, imperfections in the experimental setup, or systematic errors. These factors may degrade the actual estimation performance and deviate from the ideal precision predicted by the QCRB.

Nevertheless, the quantum Cramér-Rao bound serves as a powerful tool for understanding the fundamental limits of precision in quantum state estimation. It guides the design of optimal estimation strategies and provides insights into the trade-offs between precision and experimental resources in quantum measurement scenarios.


*How exactly:*

Certainly! Here is a basic formula for calculating the quantum Cramér-Rao bound (QCRB) for parameter estimation:

QCRB = Tr(F^{-1} M),

where:
- QCRB is the quantum Cramér-Rao bound, which provides a lower bound on the covariance matrix of any unbiased estimator.
- F is the Fisher information matrix, which quantifies the amount of information provided by the measurements about the parameter of interest.
- M is the symmetric, positive semidefinite matrix representing the measurement operators' influence on the parameter estimation.

To calculate the QCRB, you'll need to obtain the Fisher information matrix (F) and the measurement matrix (M) for the chosen estimation strategy. The Fisher information matrix can be computed based on the measurement operators and the quantum state being measured.

Below is a code snippet in Python that demonstrates how to calculate the QCRB for a simple estimation scenario involving a single-qubit state and Pauli measurement operators.

Note that this code snippet assumes a simple scenario with a single-qubit state and Pauli measurement operators. You can customize it based on your specific estimation scenario by modifying the measurement operators and the true state.

Keep in mind that this code snippet provides a basic framework for computing the QCRB, and in practice, you may need to adapt it to your specific estimation problem and measurement scheme.

**What is the connection between the (quantum) Cramér-Rao bound and the Fisher information matrix?**

The Cramér-Rao Bound (CRB), including its quantum version, is closely related to the Fisher Information Matrix (FIM). In classical statistics, the CRB is derived from the FIM, which measures the amount of information that an observable random variable carries about an unknown parameter that we are trying to estimate.

The Fisher Information Matrix (FIM) is a matrix whose entries are the expected values of the squared derivatives of the log-likelihood with respect to the parameters. The inverse of the FIM gives the Cramér-Rao Bound, which provides a lower limit on the covariance of any unbiased estimator. In other words, the CRB derived from the FIM tells us the smallest possible variance (or covariance in the multivariate case) that we can expect from an unbiased estimator.

In the quantum context, the concept is extended to quantum states and quantum measurements. The Quantum Cramér-Rao Bound (QCRB) is defined in terms of the quantum Fisher information, which is a generalization of the classical Fisher information to quantum systems. The QCRB provides a lower bound on the variance of any unbiased estimator of a parameter that is encoded in a quantum state. Like in the classical case, the QCRB is derived from the quantum Fisher information matrix, providing a fundamental limit on precision in quantum parameter estimation.

**How is the Cramér-Rao bound used in machine learning?**

In machine learning, the Cramér-Rao Bound (CRB) is often used as a tool for understanding the fundamental limits of learning algorithms, particularly in the field of parameter estimation and model selection. Here are some specific applications:

1. Variance Estimation: Similar to financial economics, the CRB provides a theoretical lower limit for the variance of an unbiased estimator. This can help understand how well a learning algorithm might perform in terms of parameter estimation, given a certain amount of data.

2. Model Complexity: The CRB can provide insights into how model complexity affects estimation accuracy. A model with too many parameters may have a higher CRB (i.e., higher variance for the best possible estimator), indicating that it could be more prone to overfitting.

3. Algorithm Evaluation: Researchers might use the CRB to evaluate and compare the performance of different learning algorithms. If an algorithm's performance is close to the CRB, it might be deemed near-optimal.

4. Designing Neural Networks: In the design of neural networks, the CRB can be used to understand the limit of what the network can learn from the data, which can help in making decisions about network architecture and training strategies.

Remember that the CRB provides a theoretical limit under certain conditions. These conditions might not always hold in practical machine learning applications, and so the CRB should be used as one tool among many for understanding and improving machine learning systems.


**How is the Cramér-Rao bound used in financial economics?**

The Cramér-Rao Bound (CRB) is a lower bound for the variance of an unbiased estimator. In financial economics, it's used in a few ways:

1. Parameter Estimation: CRB provides a benchmark for how precisely we can estimate parameters, such as those in asset pricing models, given our data. It's often used in the context of maximum likelihood estimators, which are designed to attain the CRB under certain conditions.

2. Efficiency: The CRB is used to determine the efficiency of an estimator. If an unbiased estimator reaches the Cramér-Rao lower bound, it is said to be efficient. This is important in areas like portfolio optimization and asset pricing where accuracy and efficiency of estimates are key.

3. Model Comparison: By comparing the CRBs of different models or estimators, analysts can make inferences about which model might give more precise estimates given the data. This can aid in model selection and comparative analyses.

4. Hypothesis Testing: The CRB also plays a role in statistical hypothesis testing, which is widely used in financial economics for tasks like testing the significance of factors in asset pricing models.

Please note that while the Cramér-Rao Bound provides a theoretical lower bound on variance, it's based on a number of assumptions, and these assumptions might not always hold in real-world financial data.


###### *Sample & Query Complexity*

> "information-theoretic measures of learning complexity, namely **query complexity of exact learning** and **sample complexities of PAC and agnostic learning**"

Number of queries as measure of complexity

[Sample Complexity](https://en.m.wikipedia.org/wiki/Sample_complexity): number of training-samples that we need to supply to the algorithm, so that the function returned by the algorithm is within an arbitrarily small error of the best possible function, with probability arbitrarily close to 1. The sample complexity of a concept class C is the sample complexity of the most efficient learner for C

* A larger sample size will typically lead to a more accurate model, but it will also require more data and computational resources.

* A learning algorithm with low sample complexity is more efficient because it requires fewer training examples to learn the target function.

* Example: How many samples do I need to achieve an error rate of ε = 0.01 with confidence 1 - δ = 0.95, then the sample complexity of the learning algorithm will be M(0.01, 0.95) (PAC learning)

* Learners (Algorithms) vs Examples (Samples): Quantum learners can be more powerful than classical learners, but quantum examples (samples) have the same upper and lower bound than classical examples. Sample complexity equal between classical and quantum samples.

* If we are only interested in a particular class of target functions (e.g, only linear functions) then the sample complexity is finite, and it depends linearly on the [VC dimension](https://en.m.wikipedia.org/wiki/Vapnik%E2%80%93Chervonenkis_dimension) on the class of target functions. The sample complexity of C is tightly determined by a combinatorial parameter called the VC dimension of C.

* Factors that can affect the sample complexity of a learning problem:
  * The size of the hypothesis space: A larger hypothesis space will require more training examples to learn the target function.
  * The error tolerance: A smaller error tolerance will require more training examples to learn the target function.
  * The noise level: A higher noise level will require more training examples to learn the target function.
  * The complexity of the target function: A more complex target function will require more training examples to learn.

* literature of quantum PAC learning **in which quantum and classical sample complexities are found to be equivalent but time-complexity is not** [Source](https://iopscience.iop.org/article/10.1088/1367-2630/abac39/meta)

*Exponential sample complexity when data is from quantum sensors. Time complexity is more subtle.*

**Sample complexity: advantage for quantum samples**
* (Fewer quantum data than classical data to learn the same thing:)
* Our main result in this paper is that quantum examples are not actually more powerful than classical labeled examples in the PAC model and in the agnostic model: we prove that the lower bounds on classical sample complexity of Eq. (1) and Eq. (2) hold for quantum examples as well
* despite several distribution-specific speedups, quantum examples do not signifi- cantly reduce sample complexity if we require our learner to work for all distributions D. This should be contrasted with the situation when considering the time complexity of learning

**Time complexity advantage of quantum samples:**
* one-way functions exist, they exhibited a concept class that can be learned ex- actly in polynomial time using quantum membership queries, but that takes superpolynomial time to learn from classical membership queries.
* Predictive Quantum” (PQ), a variation of quantum PAC learning
* onsider a number of quantum algorithms in learning con- texts such as clustering via minimum spanning tree, divisive clustering, and k-medians, using variants of Grover’s algorithm to improve the time complexity of the analogous classical algorithms.
* studied how well n-qubit quantum states can be learned from measurement results. In general, an n-qubit state ρ is specified by exp(n) many parameters, and exp(n) measurement results on equally many copies of ρ are needed to learn a good approximation of ρ (say, in trace distance).

*PAC and Occam Learning*

* PAC learning is a method to measure sample complexity.

* **PAC Learning**: The Probably Approximately Correct (PAC) learning framework provides a formal definition of learning and can be used to derive bounds on the sample complexity needed to learn a function to a given degree of accuracy with a certain probability.

* Leslie Valiant’s Probably Approximately Correct (PAC) model gives a precise complexity- theoretic definition of what it means for a concept class to be (efficiently) learnable. *Source: Optimal Quantum Sample Complexity of Learning Algorithms (2017)*

* [PAC Learning](https://en.m.wikipedia.org/wiki/Error_tolerance_(PAC_learning)): learner is evaluated on its predictive power of a test set.

*Occam learning*

* [Occam learning](https://en.m.wikipedia.org/wiki/Occam_learning): the objective of the learner is to output a succinct representation of received training data.
* Though Occam and PAC learnability are equivalent, the Occam framework can be used to produce tighter bounds on the sample complexity of classical problems including conjunctions, conjunctions with few relevant variables and decision lists.
* Occam algorithms have also been shown to be successful for PAC learning in the presence of errors, probabilistic concepts, function learning and Markovian non-independent examples.

**'Up to constant factors' in sample complexity**

In computational learning theory, the phrase "up to constant factors" means that two quantities are equal, with an error that is bounded by a constant. For example, if the sample complexity of learning a concept is said to be "up to constant factors" $c$, then this means that any two learning algorithms that learn the concept must use a number of samples that is within a factor of $c$ of each other.

* The reason why this phrase is used is that the exact sample complexity of a learning problem can often be difficult to determine. However, in many cases, it is possible to show that the sample complexity is "up to constant factors" of some known quantity. This allows us to make general statements about the sample complexity of learning problems, without having to worry about the exact values of the constants.

* For example, the **PAC learning model (which stands for Probably Approximately Correct) guarantees that any concept that can be PAC-learned can be learned with a sample complexity that is polynomial in the size of the hypothesis space and logarithmic in the inverse of the desired error rate**. This means that **the sample complexity is "up to constant factors" of the logarithm of the inverse of the error rate.**

* The phrase "up to constant factors" is also used in other areas of computer science, such as algorithm analysis and complexity theory. In general, it is used to mean that two quantities are equal, with an error that is bounded by a constant. This can be a useful way to make statements about the performance of algorithms or the complexity of problems, without having to worry about the exact values of the constants.

###### *Time Complexity*

Time as measure of complexity

###### *Information Entropy*

Shannon’s information theory and quantum Shannon theory (or quantum information theory)

*Quantum Information Entropy*

In quantum information theory, an entropy function is a measure of the uncertainty or randomness of a quantum state. There are many different entropy functions that have been proposed, each with its own advantages and disadvantages.

Some of the most common entropy functions in quantum information theory include:

* **Von Neumann entropy:** This is the most commonly used entropy function in quantum information theory. It is defined as the Shannon entropy of the eigenvalues of the density matrix of the quantum state.
* **Tsallis entropy:** This is a generalization of the Shannon entropy that allows for a wider range of values. It is defined as

> $S_q(\rho) = \frac{1}{1-q} \sum_i p_i^q$

where $p_i$ are the eigenvalues of the density matrix $\rho$ and $q$ is a real number.
* **Renyi entropy:** This is another generalization of the Shannon entropy. It is defined as

> $S_\alpha(\rho) = \frac{1}{1-\alpha} \log \left( \sum_i p_i^\alpha \right)$

where $p_i$ are the eigenvalues of the density matrix $\rho$ and $\alpha$ is a real number.

* **Conditional entropy:** This measures the uncertainty of a quantum state given the knowledge of another quantum state. It is defined as

> $S(A|B) = S(\rho_{AB}) - S(\rho_B)$

where $\rho_{AB}$ is the joint density matrix of the two quantum states, $\rho_B$ is the density matrix of the second quantum state, and $S(\rho)$ is the entropy of the density matrix $\rho$.

* **Relative entropy:** This measures the distinguishability between two quantum states. It is defined as


> $D(\rho \| \sigma) = \Tr (\rho \log \rho) - \Tr (\rho \log \sigma)$,


where $\rho$ and $\sigma$ are the two quantum states.

These are just a few of the many entropy functions that have been proposed in quantum information theory. The choice of which entropy function to use depends on the specific application.

Entropy functions are used in quantum information theory for a variety of purposes, including:

* Quantifying the entanglement of quantum states.
* Measuring the amount of information that can be stored in a quantum system.
* Determining the security of quantum communication protocols.
* Developing new quantum algorithms.

Entropy functions are a powerful tool for understanding the properties of quantum systems and for developing new quantum technologies.

*Shannon Entropy, Kolmorogov Entropy, Mutual information (incl. Kullback-Leibler) and Holevo's theorem*

**What is the difference between shannon entropy and kolmorogov entropy?**

Shannon entropy and Kolmogorov complexity, also known as Kolmogov entropy, are both measures of the randomness, disorder, or complexity in a set of data, but they come from different theoretical perspectives and are calculated in different ways.

**Shannon entropy** is a concept from information theory. It quantifies the expected value of the information contained in a message. For a given probability distribution, Shannon entropy measures the average unpredictability or randomness in the data. Shannon entropy is typically computed using logarithms and probabilities, and it is usually applied in the context of a stochastic (random) process.

On the other hand, **Kolmogov entropy or Kolmogorov complexity**, which comes from algorithmic information theory, is a measure of the computational resources needed to specify or reproduce a string, dataset, or other pieces of information. Specifically, the Kolmogov complexity of a string is the length of the shortest possible description of the string in some fixed universal description language. It's not computable in general, but it gives us a way to talk about the "absolute" complexity of a string independently of any specific probability distribution.

To summarize, Shannon entropy is a probabilistic measure of average unpredictability in data, while Kolmogov complexity is a measure of the computational resources needed to reproduce a piece of data.

**Are Fisher information and Kullback–Leibler divergence related?**

Yes, the Fisher information and [Kullback–Leibler (KL) divergence](https://en.m.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence) are related. The Fisher information is the expected value of the second derivative of the log likelihood function, while the KL divergence is a measure of the difference between two probability distributions. The KL divergence can be written as a quadratic form whose coefficients are given by the elements of the Fisher information matrix. This means that the Fisher information matrix defines the local curvature of the KL divergence.

In other words, the Fisher information measures how much the log likelihood function changes when the parameters of the distribution are changed. The KL divergence measures how much one distribution differs from another. The relationship between the two quantities means that the Fisher information can be used to estimate the KL divergence between two distributions.

The Fisher information is a useful quantity in statistics and machine learning. It can be used to estimate the parameters of a distribution, to make inferences about the distribution, and to design experiments. The KL divergence is also a useful quantity in statistics and machine learning. It can be used to measure the similarity between two distributions, to select features for machine learning models, and to evaluate the performance of machine learning models.

Here are some additional points about the relationship between the Fisher information and the KL divergence:

* The Fisher information is always non-negative, and it is equal to zero if and only if the log likelihood function is a linear function of the parameters.
* The KL divergence is always non-negative, and it is equal to zero if and only if the two distributions are the same.
* The Fisher information is a measure of the sensitivity of the log likelihood function to changes in the parameters. The larger the Fisher information, the more sensitive the log likelihood function is to changes in the parameters.
* The KL divergence is a measure of the distance between two distributions. The smaller the KL divergence, the closer the two distributions are.

**Are Mutual information and Kolmogorov entropy related?**


Yes, [mutual information](https://en.m.wikipedia.org/wiki/Mutual_information) and Kolmogorov entropy are related. Mutual information measures the amount of information that one random variable X conveys about another random variable Y. Kolmogorov entropy measures the amount of information required to describe a random variable.

The relationship between mutual information and Kolmogorov entropy can be seen in the following equation:

```
I(X; Y) = H(X) - H(X | Y)
```

where:

* I(X; Y) is the mutual information between X and Y
* H(X) is the entropy of X
* H(X | Y) is the conditional entropy of X given Y

The first term on the right-hand side of the equation, H(X), is the amount of information required to describe X. The second term on the right-hand side, H(X | Y), is the amount of information required to describe X given Y. The difference between these two terms is the amount of information that Y conveys about X.

In other words, the mutual information between X and Y is the amount of information that is shared between X and Y. This information can be used to reduce the amount of information required to describe X.

The relationship between mutual information and Kolmogorov entropy can be used to understand the complexity of random variables. For example, if two random variables are independent, then their mutual information is zero. This means that there is no shared information between the two variables, and they cannot be used to reduce the amount of information required to describe each other.

On the other hand, if two random variables are correlated, then their mutual information is nonzero. This means that there is some shared information between the two variables, and they can be used to reduce the amount of information required to describe each other.

The relationship between mutual information and Kolmogorov entropy is a fundamental concept in information theory. It has applications in a wide variety of fields, including machine learning, data compression, and cryptography.

**what is Holevo’s theorem?**

https://en.m.wikipedia.org/wiki/Holevo%27s_theorem

Holevo's theorem, also known as Holevo's bound, is a fundamental result in quantum information theory that places an upper limit on the amount of classical information that can be extracted from a quantum system.

In more precise terms, the theorem states that if we have a quantum system prepared in one of several states ρ_i, each with probability p_i, then the amount of classical information that can be obtained by a measurement on the system is bounded by the Holevo quantity χ(ρ), which is given by:

χ(ρ) = S(∑ p_i ρ_i) - ∑ p_i S(ρ_i)

where S(ρ) denotes the von Neumann entropy of the state ρ, and the sum ∑ p_i ρ_i represents the overall state of the system averaged over the probabilities p_i.

This theorem has important implications for quantum communication. Despite the fact that a single qubit can exist in a superposition of states and appears to carry an infinite amount of information, <font color="blue">Holevo's theorem tells us that when we measure that qubit, we can extract at most 1 bit of classical information from it</font>. ***This makes clear the distinction between information capacity of quantum states and the accessible information that can be extracted via measurements.***

***Holevo's theorem and Kolmogorov entropy are related.***

* **Holevo's theorem** is a limitative theorem in quantum computing that establishes an upper bound to the amount of information that can be known about a quantum state (accessible information). It states that the accessible information of a quantum state is upper bounded by the Holevo quantity, which is a function of the von Neumann entropy of the state and the entropy of the measurement.
* **Kolmogorov entropy** is a measure of the uncertainty or randomness of a probability distribution. It is defined as the logarithm of the number of possible outcomes of an experiment.

The Holevo quantity can be expressed in terms of the Kolmogorov entropy of the measurement. This means that the Holevo bound can be used to quantify the amount of information that can be extracted from a quantum state by performing a measurement.

In other words, Holevo's theorem states that the amount of information that can be known about a quantum state is limited by the uncertainty of the measurement that is used to extract the information.

The Holevo bound has been used to study a variety of problems in quantum information theory, such as quantum cryptography and quantum communication. It has also been used to design quantum algorithms that are more efficient than classical algorithms.

**What is metric entropy?**

In mathematics, metric entropy is a measure of the complexity of a metric space. It is defined as the logarithm of the minimum number of open balls of radius δ needed to cover the space.

Formally, let (X, d) be a metric space and δ be a positive real number. The δ-packing number of (X, d), denoted by M(δ, X, d), is the minimum number of points in X such that no two points are within distance δ of each other. The metric entropy of (X, d) is then defined as

```
H(δ, X, d) = log M(δ, X, d)
```

The metric entropy can be used to measure the complexity of a metric space in a number of ways. For example, a space with high metric entropy is more difficult to compress than a space with low metric entropy. Additionally, the metric entropy can be used to estimate the rate of convergence of certain algorithms.

The metric entropy of words is a measure of the randomness of a word. It is calculated by finding the minimum number of balls of radius δ needed to cover the set of all words of length n. The smaller the metric entropy, the more random the word.

The metric entropy covering number is the minimum number of balls of radius δ needed to cover a metric space. It is a measure of the complexity of the space. The larger the metric entropy covering number, the more complex the space.

Here are some of the uses of metric entropy:

* In information theory, metric entropy is used to measure the amount of information contained in a sequence of symbols.
* In machine learning, metric entropy is used to estimate the complexity of a data set.
* In computational geometry, metric entropy is used to design algorithms for searching and clustering.
* In dynamical systems, metric entropy is used to measure the complexity of a dynamical system.


Metrics and Information Entropy

- Among all continuous distributions, Gaussians maximize differential entropy subject to moment constraints E[x] and E[x^2].  Gaussians = Continuous exponential family

- Among all discrete distributions, lattice Gaussians maximize Shannon entropy with moments E[x] and E[x^2] taken wrt to a lattice.
 Lattice Gaussians= Discrete exponential family

https://franknielsen.github.io/LatticeGaussian/LatticeGaussian.html

###### *Concentration Inequalities*

Concentration Inequalities = Bounds on Deviation of Estimator

* Background: PAC learning classical functions with quantum learners, and using [concentration inequalities](https://en.m.wikipedia.org/wiki/Concentration_inequality) to bound the sample complexity of tomography like problems

* Concentration inequalities: analyze performance of machine learning algorithms. **They can be used to show that a learning algorithm will converge to the correct hypothesis with high probability, even if the data is noisy or incomplete.** Concentration Inequalities = Abweichungen vom Erwartungswert.

* Concentration inequalities provide a lower bound on the probability that the random variable will not deviate too far from its expected value. This lower bound is typically very high, such as 95% or 99%.

  * [Hoeffding's inequality](https://de.m.wikipedia.org/wiki/Hoeffding-Ungleichung): This inequality bounds the probability that the sum of independent random variables deviates from its expected value by more than a certain amount.
  * Chernoff bounds: These bounds are more general than Hoeffding's inequality and can be used to bound the tails of probability distributions.
  * Bernstein's inequality: This inequality is a stronger version of Chernoff bounds and can be used to bound the tails of sub-Gaussian distributions.





**Concentration inequalities are related to both sample complexity and model complexity.**

* **Sample complexity:** Concentration inequalities can be used to derive bounds on the sample complexity of learning problems. For example, Hoeffding's inequality can be used to show that the **sample complexity of learning a linear regression model with a certain level of accuracy is proportional to the logarithm of the number of features**.
* **Model complexity:** Concentration inequalities can be used to control the complexity of machine learning models. For example, the VC dimension of a hypothesis space can be bounded using concentration inequalities. This can be used to design machine learning algorithms that are guaranteed to generalize well to new data.

In general, concentration inequalities can be used to analyze the performance of machine learning algorithms and to design algorithms that are efficient and robust. Here are some specific examples of how concentration inequalities can be used to relate sample complexity and model complexity:

* **PAC learning:** In PAC learning, the sample complexity of a learning algorithm depends on the VC dimension of the hypothesis space. Concentration inequalities can be used to bound the VC dimension of the hypothesis space, which can then be used to derive bounds on the sample complexity.
* **Regularization:** Regularization is a technique used to control the complexity of machine learning models. Concentration inequalities can be used to analyze the effects of regularization on the generalization performance of machine learning models.
* **Ensemble learning:** Ensemble learning is a technique that combines multiple machine learning models to improve performance. Concentration inequalities can be used to analyze the performance of ensemble learning algorithms.

**And where do Concentration Inequalities Belong to?**

Concentration inequalities belong to **probability theory**. They are used to bound the probability that a random variable deviates from some value (typically, its expected value).

Concentration inequalities are used in a wide variety of fields, including statistics, machine learning, and cryptography. In statistics, they are used to provide guarantees on the accuracy of estimators. In machine learning, they are used to analyze the robustness of algorithms to noise and outliers. In cryptography, they are used to prove the security of cryptographic protocols.

Some common concentration inequalities include:

* Chebyshev's inequality
* Markov's inequality
* Chernoff bounds
* Hoeffding's inequality
* Bernstein's inequality

These inequalities can be used to bound the probability of a random variable deviating from its expected value by a certain amount. For example, Chebyshev's inequality states that the probability of a random variable X deviating from its expected value E(X) by more than k standard deviations is at most 1/k^2.

Concentration inequalities are a powerful tool for analyzing the behavior of random variables. They are used in a wide variety of fields, and they are essential for understanding the robustness of statistical estimators, machine learning algorithms, and cryptographic protocols.

Here are some examples of how concentration inequalities are used in machine learning:

* To bound the generalization error of a learning algorithm.
* To analyze the robustness of a learning algorithm to noise and outliers.
* To design efficient algorithms for learning with high confidence.

Concentration inequalities are a powerful tool for machine learning researchers. They can be used to analyze the behavior of learning algorithms and to design efficient algorithms that can learn with high confidence.

**So Concentration Inequalities do not Belong to model complexit or sample complexity?**

That's right. Concentration inequalities do not belong to model complexity or sample complexity. They are a tool that can be used to analyze both model complexity and sample complexity, but they are not themselves a measure of either one.

Model complexity is a measure of the number of parameters in a model. A more complex model can fit the data more closely, but it is also more likely to overfit the data. Sample complexity is the number of samples needed to achieve a certain level of accuracy in parameter estimation. It depends on the Fisher information, which is a measure of how much information the data provides about the parameters of a model.

Concentration inequalities can be used to bound the probability that a learning algorithm makes a mistake, regardless of the model complexity or the sample complexity. For example, Hoeffding's inequality states that the probability that the average of n independent random variables deviates from its expected value by more than ε is at most 2e^(-2nε^2). This inequality can be used to bound the generalization error of a learning algorithm, regardless of the model complexity or the sample complexity.

In other words, concentration inequalities are a tool that can be used to analyze the behavior of learning algorithms, but they are not themselves a measure of the model complexity or the sample complexity of the learning algorithm.

#### <font color="Blue">**Computational Complexity**

###### *Computational Complexity Theory*

> **Quantum Computing: understand the ultimate physical limits to computation**

**Most important complexity classes**
* P: Can be solved by a deterministic classical computer in polynomial time
* BPP: Can be solved by a probabilistic classical computer in polynomial time
* BQP: Can be solved by a quantum computer in polynomial time
* NP: Solution can be checked by a deterministic classical computer in polynomial time
* QMA: Solution can be checked by a quantum computer in polynomial time
* [PostBQP](https://en.m.wikipedia.org/wiki/PostBQP) = PP

**Further important complexity classes**

* PSpace:
* Psharp:
* MIP:
* RE:
* DQC1:
* More: https://complexityzoo.net/Complexity_Zoo


**MIP* = RE** (aus Spektrum der Wissenschaft 7/20)

* enthält MIP* sämtliche berechenbaren Probleme der Informatik!
* Dem Beweis zufolge ist MIP* identisch mit der riesigen Komplexitätsklasse RE. Sie umfasst alle Entscheidungspro- bleme (solche, deren Antwort Ja oder Nein lautet), die ein Computer in endlicher Zeit bejahen kann.
* Darunter fällt unter anderem die hartnäckigste aller Aufgaben, das be- rühmte Halteproblem. Dabei geht es darum, zu bestimmen, ob ein Computer bei einer Berechnung jemals anhalten kann – oder für immer weiterrechnet.
* Ito, T., Vidick, T.: A multi-prover interactive proof for NEXP sound against entangled provers. ArXiv 1207.0550, 2012
* **Ji, Z. et al.: MIP* = RE. ArXiv 2001.04383, 2020**
* Natarajan, A., Wright, J.: NEEXP in MIP*. ArXiv 1904.05870, 2019

![geometry](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_1646.png)


![geometry](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_1274.png)


![geometry](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_1646.jpg)


![geometry](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_1647.png)

*Source: https://www.researchgate.net/figure/Computability-hierarchy-and-computational-complexity-classes_fig5_341817215*

**(Quantum) Complexity Classes**
* [Complexity_Zoo](https://complexityzoo.net/Complexity_Zoo) and [Complexity class](https://en.m.wikipedia.org/wiki/Complexity_class) AND [Computational_problem](https://en.m.wikipedia.org/wiki/Computational_problem)
* **Problems**: Decision, Sampling, Counting, Verifying, Optimization, Function etc.
* [Decision Problems](https://en.m.wikipedia.org/wiki/Decision_problem) - see also [Search Problem](https://en.m.wikipedia.org/wiki/Search_problem)
  * [PH complexity](https://en.m.wikipedia.org/wiki/PH_(complexity)) - is the union of all complexity classes in the polynomial hierarchy
  * [PSPACE](https://de.m.wikipedia.org/wiki/PSPACE) - Problems that can be solved using a polynomial amount of memory, and possibly exponential time.
  * [PP](https://de.m.wikipedia.org/wiki/Probabilistische_Polynomialzeit) - Entscheidungen, die in von einer probabilistischen Turingmaschine in Polynomialzeit lösbar ist und die Antwort in mindestens der Hälfte der Fälle richtig ist.
  * [BQP](https://en.m.wikipedia.org/wiki/BQP), that are not in [BPP](https://de.m.wikipedia.org/wiki/BPP_(Komplexitätsklasse)): [Factorization](https://de.m.wikipedia.org/wiki/Faktorisierung) with [Shor's algorithm](https://de.m.wikipedia.org/wiki/Shor-Algorithmus). BQP = P? - Open Question. Dequantized algorithms for such problems – high-rank matrix inversion, for example – would imply that classical computers can efficiently simulate quantum computers, i.e., BQP = P, which is not currently considered to be likely. [Source](https://arxiv.org/abs/1905.10415)
  * [BPP](https://de.m.wikipedia.org/wiki/BPP_(Komplexitätsklasse))
  * [RP](https://en.m.wikipedia.org/wiki/RP_(complexity)) und [ZPP](https://en.m.wikipedia.org/wiki/ZPP_(complexity))
  * [P](https://en.m.wikipedia.org/wiki/P_(complexity)) - problems that can be solved quickly, or "in polynomial time". P $\subseteq$ BQP (i.e. anything you can do with a classic computer you can do with a quantum computer). We don't know if that is a strict inequality! [Source](https://quantumcomputing.stackexchange.com/questions/16506/can-quantum-computer-solve-np-complete-problems)
  * [NP](https://en.m.wikipedia.org/wiki/NP_(complexity)) - problems for which a solution can be checked quickly
    * [List of NP problems](https://en.m.wikipedia.org/wiki/List_of_NP-complete_problems)
    * [NP-Complete](https://en.m.wikipedia.org/wiki/NP-completeness) (=the hardest of the problems to which solutions can be verified quickly), like [Halting problem](https://en.m.wikipedia.org/wiki/Halting_problem) or [3SAT](https://en.m.wikipedia.org/wiki/Boolean_satisfiability_problem) (except, one manages to create a reduction of Grovers algorithm on this NP-Complete algorithm)
    * Can quantum computer solve NP-complete problems? - if you solve any NP-complete problem, all other NP problems come as a 'freebie' (not just the NP-complete ones). In that sense, it would be a huge milestone. It is widely believed that quantum computers cannot solve NP-complete problems, but it has never been proven [Source](https://quantumcomputing.stackexchange.com/questions/16506/can-quantum-computer-solve-np-complete-problems)
    * [NP-Hard](https://en.m.wikipedia.org/wiki/NP-hardness)
      * [travelling salesman](https://en.m.wikipedia.org/wiki/Travelling_salesman_problem) (only annealing can approach better than classical computers)
      *[Sign Problem](https://en.m.wikipedia.org/wiki/Numerical_sign_problem) is NP hard: [Sign-Problem-Free Fermionic Quantum Monte Carlo: Developments and Applications](https://arxiv.org/pdf/1805.08219.pdf)
  * [Co-NP](https://de.m.wikipedia.org/wiki/Co-NP): Complement of NP. Problems for which a "no" answer can be verified in polynomial time
  * [P versus NP](https://en.m.wikipedia.org/wiki/P_versus_NP_problem): Open question, [video1](https://youtu.be/EHp4FPyajKQ), [video2](https://youtu.be/YX40hbAHx3s)
  * [BQNP: The quantum analogue of NP](https://medium.com/mit-6-s089-intro-to-quantum-computing/bqnp-the-quantum-analogue-of-np-486ed2469c1d) and [What is the relationship between BQP and NP?](https://www.quora.com/What-is-the-relationship-between-BQP-and-NP-1)
* Sampling Problem
  * RP: The class of problems for which a randomized algorithm can give the correct answer in polynomial time, with a probability of at least 1/2 for "yes" instances.
  * ask for samples from probability distributions. [Source](https://en.m.wikipedia.org/wiki/Quantum_supremacy)
  * [Boson sampling](https://en.m.wikipedia.org/wiki/Boson_sampling) under low enough error: Given a linear optical network (a device that takes in multiple photons, passes them through a series of beam splitters and phase shifters, and then outputs them), where each photon is initially sent into a different input mode, the task is to sample from the probability distribution of photon count outcomes at the network's outputs. In more detail, you start with an arrangement of beam splitters and phase shifters that form a linear optical network. You inject a number of indistinguishable single photons into this network. The photons then interact with the network and with each other due to quantum interference and finally are detected at the output. The task is to compute the probabilities of different detection patterns.
  * [Sampling the output distribution of random quantum circuits
  ](https://en.m.wikipedia.org/wiki/Quantum_supremacy#Sampling_the_output_distribution_of_random_quantum_circuits) - Google experiment
  * [Solving the sampling problem of the Sycamore quantum circuits](https://arxiv.org/abs/2111.03011)
  * [Quantum Sampling Problems, Boson Sampling and Quantum Supremacy](https://arxiv.org/abs/1702.03061)
  * [Computational advantage of quantum random sampling](https://arxiv.org/abs/2206.04079)
* [Counting Problem](https://en.m.wikipedia.org/wiki/Counting_problem_(complexity))
  * [Sharp-P](https://de.m.wikipedia.org/wiki/Sharp-P) How many solutions are there? Sharp P complete problems: #SAT oder Anzahl der perfekten Matchings eines bipartiten Graphen
  * P#P: Problems solvable in polynomial time given a #P oracle, which can solve counting problems instantly.
* Proof Verifier Problems
  * [IP](https://en.m.wikipedia.org/wiki/IP_(complexity))
  * [MIP* = RE ?](https://medium.com/mit-6-s089-intro-to-quantum-computing/mip-re-6e903720c82f)
  * [QMA](https://en.m.wikipedia.org/wiki/QMA): many interesting classes are contained in QMA, such as P, BQP and NP, all problems in those classes are also in QMA. However, there are problems that are in QMA but not known to be in NP or BQP. A [list of known QMA-complete problems](https://arxiv.org/abs/1212.6312)
    * Quantum circuit/channel property verification (V)
    * Hamiltonian ground state estimation (H), icl. Quantum k-SAT (S)
    * Density matrix consistency (C)
* Optimization problems
  * PLS is the class of optimization problems for which a local optimal solution can be found in polynomial time. These are problems for which it's easy to find a solution that is optimal within a small neighborhood, but it might not be the global optimal solution.
  * NPO is the class of optimization problems that are the optimization equivalent to the class NP. An NPO problem is one where a candidate solution can be checked in polynomial time (to see how good it is), and the goal is to find the optimal solution.
  * APX for problems that can be approximated within a constant factor in polynomial time, and many others.
* [Function problems](https://en.m.wikipedia.org/wiki/Function_problem)
  * [FP](https://en.m.wikipedia.org/wiki/FP_(complexity)): The function problem equivalent of P, where the task is to compute a specific output rather than just deciding yes/no.





[Komplexitätstheorie](https://youtu.be/ctSBdktEG9Y) - Berechnungen der Dauer für Kombinatorische Optimierung

Yes, there are many other types of problems in the field of computer science and related areas. Here are a few:

1. Search Problems: The goal here is to find a solution that satisfies a certain condition. Examples of complexity classes for search problems include P and NP.

2. Function Problems: These are problems where you're asked to compute a function rather than make a decision. FP is a common complexity class for function problems.

3. Promise Problems: These are problems that come with a guarantee, or "promise," that the input will satisfy a certain property.

4. Eigenvalue Problems: These problems involve finding the eigenvalues of a given operator or matrix. This is a key problem in several areas of physics and computer science, including quantum computing.

5. Differential Equations: Problems that involve finding solutions to differential equations, which are equations involving derivatives of a function.

6. Constraint Satisfaction Problems: Here, the goal is to find an assignment of values to variables that satisfies a set of constraints.

7. Game Theory Problems: These problems involve decision-making situations where the outcome depends on the strategies chosen by multiple "players."

8. Packing and Covering Problems: In these problems, the objective is to pack (or cover) items into (or with) the smallest number of containers (or elements).

9. Scheduling Problems: These involve allocating resources to tasks over time.

10. Routing Problems: These problems involve finding the most efficient routes for things like deliveries, telecommunications data packets, or airline schedules.

11. Problems in Cryptography: These involve things like factoring large numbers, discrete logarithms, etc., which underpin the security of many cryptographic systems.

Remember, many problems can be phrased as different types of problems. For example, an optimization problem could be rephrased as a decision problem by asking whether there exists a solution with a value better than a certain threshold. Different formulations might make the problem easier or harder to solve.

> ${\mathsf {P\subseteq BPP\subseteq BQP\subseteq PP\subseteq PSPACE}}$

from: [Quantum complexity theory (wiki)](https://en.m.wikipedia.org/wiki/Quantum_complexity_theory) - Two important quantum complexity classes are BQP and QMA

DQC1 hard: https://en.m.wikipedia.org/wiki/One_Clean_Qubit

statistical zero knowledge

Bqp / q poly


HeurBQP/qpoly ⊆ HeurQMA/poly

(The Learnability of Quantum States, 2004)

https://www.quantamagazine.org/a-short-guide-to-hard-problems-20180716/

https://www.quantamagazine.org/complexity-theorys-50-year-journey-to-the-limits-of-knowledge-20230817/

https://www.quantamagazine.org/edit-distance-reveals-hard-computational-problems-20150929/

the new algorithm has moved graph isomorphism much closer to P than ever before, Babai speculated in his first talk that the problem may lie just outside its borders, in the suburbs rather than the city center

https://www.quantamagazine.org/algorithm-solves-graph-isomorphism-in-record-time-20151214/

Five worlds of Hardness

https://www.quantamagazine.org/which-computational-universe-do-we-live-in-20220418/

[Computational complexity theory](https://en.m.wikipedia.org/wiki/Computational_complexity_theory)


https://de.m.wikipedia.org/wiki/Kombinatorische_Optimierung

*Time Complexity vs Space Complexity*

[Time Complexity](https://en.m.wikipedia.org/wiki/Time_complexity): Big O notation. The time complexity is the [computational complexity](https://en.m.wikipedia.org/wiki/Computational_complexity) that describes the amount of computer time it takes to run an algorithm. Time complexity is commonly estimated by counting the number of elementary operations performed by the algorithm. See also [Time hierarchy theorem](https://en.m.wikipedia.org/wiki/Time_hierarchy_theorem).

[Space complexity](https://en.m.wikipedia.org/wiki/Space_complexity) is generally expressed as the amount of memory required by an algorithm on an input of size n. See also [Space hierarchy theorem](https://en.m.wikipedia.org/wiki/Space_hierarchy_theorem).

*Average-case complexity vs Worst-case complexity*

https://en.m.wikipedia.org/wiki/Worst-case_complexity

https://en.m.wikipedia.org/wiki/Average-case_complexity

*Complexity Classes*

https://en.m.wikipedia.org/wiki/Space_hierarchy_theorem

https://en.m.wikipedia.org/wiki/Time_hierarchy_theorem

meta-complexity: https://www.quantamagazine.org/complexity-theorys-50-year-journey-to-the-limits-of-knowledge-20230817/

**SAT**

https://de.m.wikipedia.org/wiki/Erf%C3%BCllbarkeitsproblem_der_Aussagenlogik

https://www.scinexx.de/news/technik/neuer-loesungsansatz-fuer-mathematische-sat-probleme/

[Turing and the Halting problem (machine)](https://youtu.be/macM_MtS_w4)
Why are most decision problems uncomputable? - Proof https://youtu.be/moPtwq_cVH8
* Decision problems answer is binary (chess, tetris, halting problem, negative weight cycle detection)
* Decision problems are as hard as optimiyation problems
* Why are most decision problems uncomputable. Proof with se theory:
    * Define a progam: space of all possible programs ≈ you can think of it as binary strings (reduced). You can also think of numbers represented as binary strings ≈ natural number element N
    * Define a Decision problem: function that maps inputs to yes (1) or no (0). Input ≈ is a binary string element of N (natural numbers). It‘s a function from N to 0/1. **Every infinite string of bits represents a decision problem.** Output is infite! **A program is a fintie string of bits.**
    * You can write down a table of all answers: **a decision problem is an infite string of bits:** =110001010100010111. A program is a finite string of bits.** So they are different.
    * One way to see the difference is to add a decimal point: .110001010100010111 - now this infinite string of bits in the output of a decision problem is a real number between 0 and 1 (written in binary). Any real number can be represented by an infinite string of bits.
    * A decision problem is element of R, meanwhile a program is element of N (set of all integers).
    * But R >> N. R (uncountably infinite) >> N (countable infinite), **there are way more problems than there are programs to solve them**, almost every problem unsolvable by any program.

**Einführung**
* [Quantum_complexity_theory](https://en.m.wikipedia.org/wiki/Quantum_complexity_theory)
* Ausgangspunkt: [quantum many body problem](https://en.m.wikipedia.org/wiki/Many-body_problem): it is  believed that it is exponentially difficult for classical computers to simulate quantum systems ([Source: Quantum_threshold_theorem](https://en.m.wikipedia.org/wiki/Quantum_threshold_theorem#Notes)). Quantum computers can simulate many Hamiltonians in [polynomial time with bounded errors](https://en.m.wikipedia.org/wiki/BQP) (BQP): chemical simulations, drug discovery, energy production, [climate modeling](https://en.m.wikipedia.org/wiki/Climate_model) and fertilizer production (e.g. [FeMoco](https://en.m.wikipedia.org/wiki/FeMoco)).
* Scott Aaronson
  * published an excellent article ["The Limits of Quantum Computers"](https://www.scientificamerican.com/article/the-limits-of-quantum-computers/): **Quantum computers may provide a massive speedup for problems in the BQP complexity class. Nothing more**. They are no more powerful than a Turing machine (in terms of the problems that can be solved)
  * [The_Limits_of_Quantum_Computers](https://www.cs.virginia.edu/~robins/The_Limits_of_Quantum_Computers.pdf)
  * [How Big are Quantum States?](https://www.scottaaronson.com/democritus/lec13.html)
* [Quantamagazine: Finally, a Problem That Only Quantum Computers Will Ever Be Able to Solve
](https://www.quantamagazine.org/finally-a-problem-that-only-quantum-computers-will-ever-be-able-to-solve-20180621/)
* [Quantum_supremacy: Computational_complexity](https://en.m.wikipedia.org/wiki/Quantum_supremacy#Computational_complexity)
* [Nondeterministic_Turing_machine](https://en.m.wikipedia.org/wiki/Nondeterministic_Turing_machine)
* Iterative improvements in classical algorithm have so far not paid off to transform a hard problem to an easy one. This is where quantum computing comes in: an “easy” problem that can be solved using a quantum computer in polynomial time is class BQP (Bounded-error Quantum Polynomial time), and a hard problem which can only be verified in polynomial time is class QMA (the playfully named Quantum Merlin Arthur, https://en.m.wikipedia.org/wiki/QMA). The hope in the field is that there is some overlap between the space of NP problems and BQP problems: that by leveraging quantum resources like superposition and entanglement, a hard problem can be transformed into an easy one.

**Big-O Notation (Algorithmic Efficiency / Runtime)**

![geometry](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_1273.png)

* [Examples of runtime with big O](https://stackoverflow.com/questions/2307283/what-does-olog-n-mean-exactly)
* Examples of [Computational complexity of mathematical operations](https://en.m.wikipedia.org/wiki/Computational_complexity_of_mathematical_operations#Matrix_algebra)
* [Brilliant: Complexity Theory](https://brilliant.org/wiki/complexity-theory/)
* [Theory of computation](https://en.m.wikipedia.org/wiki/Theory_of_computation)
* [Big O notation (Landau)](https://en.m.wikipedia.org/wiki/Big_O_notation): Big (O) worste case / Big Ω (Omega) best case / Big θ (Theta)
* [Time complexity](https://en.m.wikipedia.org/wiki/Time_complexity), Space Complexity = Auxiliary Space + Space used for input values, [Space time tradeoff](https://en.m.wikipedia.org/wiki/Space–time_tradeoff), [Garbage collection](https://de.m.wikipedia.org/wiki/Garbage_Collection), [Time and Space Complexity Analysis of Algorithm](https://afteracademy.com/blog/time-and-space-complexity-analysis-of-algorithm), [How to compute Time Complexity or Order of Growth of any program](https://www.rookieslab.com/posts/how-to-compute-time-complexity-order-of-growth-of-any-program)


**Special: Computation and Consciousness**
* [Integrated information theory](https://en.m.wikipedia.org/wiki/Integrated_information_theory) - Is consciousness reducable to computation?
* [Hard problem of consciousness](https://en.m.wikipedia.org/wiki/Hard_problem_of_consciousness) and [Pretty hard problem](https://forum.effectivealtruism.org/posts/Qiiiv9uJWLDptH2w6/the-pretty-hard-problem-of-consciousness)
* [p Zombie (Stanford)](https://plato.stanford.edu/entries/zombies/)
* [Graph expansion](https://de.m.wikipedia.org/wiki/Expander-Graph)
* [Theoretical_computer_science](https://en.m.wikipedia.org/wiki/Theoretical_computer_science)
* [Computability_theory](https://en.m.wikipedia.org/wiki/Computability_theory)
* [Computational_complexity_theory](https://en.m.wikipedia.org/wiki/Computational_complexity_theory)
* Lattice based crypto systems
* linear cross entropy benchmark
* Quantum pcp problem
* busy beaver of 6: biggest number (incl. goldbach conjecture)
* It from qubit
* Violation Bell ineuqlity experiment in 2015. it closed Locatility loophole and detection loophole - physicao reality of entsnglemenr proven. open: superdeterminism.
* Church turing thesis
* Unknown laws of quantum gravity - uncomputable (eg with an oracle for a quantum problem)



**Categorization of Optimization Problem with different orders of their objective function**
* Einführung: [Optimization_problem](https://en.m.wikipedia.org/wiki/Optimization_problem)
* **First order**: [Linear optimization](https://en.m.wikipedia.org/wiki/Linear_programming): Many practical problems in [operations research](https://en.m.wikipedia.org/wiki/Operations_research) can be expressed as linear programming problems. See [Simplex_algorithm](https://en.m.wikipedia.org/wiki/Simplex_algorithm)
* **Second Order**: [Quadratic optimization (Quadratic programming)](https://en.m.wikipedia.org/wiki/Quadratic_programming): finance (for portfolio optimization), machine learning (for support vector machines), and operations research. Example: In the Markowitz model, the objective function to be minimized is the portfolio variance, which is a quadratic function of the decision variables (asset weights in the portfolio), given the covariance matrix of the returns of the assets.
* **Higher Order**: Polynomial optimization problems or non-linear optimization problems (optimize design of a machine part, where the objective function includes terms related to the volume of material used which might be cubic if we're considering three-dimensional parts. Often require more advanced techniques, such as [interior-point methods](https://de.m.wikipedia.org/wiki/Innere-Punkte-Verfahren), branch-and-bound techniques, or heuristic methods.

**QAOA** (Ryan)

https://arxiv.org/pdf/2011.04149.pdf

https://arxiv.org/pdf/2004.04197.pdf

https://journals.aps.org/prx/abstract/10.1103/PhysRevX.6.031015

https://journals.aps.org/prxquantum/abstract/10.1103/PRXQuantum.1.020312

10^53

Research team has done a lot of optimisation

Killer app - almost all business have logistics, scheduling, load balancing, portfolio optimisation - holy grail application, because it’s broad

Team has some attention on this topic, 2013 had most focus, at that time annealing /adbatic QC at that time

In real world: problem only on a couple of hundred bits, dicciutl to find good solution classically, formlery NP hard,determistic algorithm, formerly exponentially scaling, - classicl herutsics algorithm (simulated annealing) very good, have exponential runtime, contrast: algorithims that run exactly: but run in practice too long, in practice more heuristics,

Just hundreds of bits: classical easy. Thousands bits to have truly challenging instances. Optimisation variable surely binary. But normally: real variable or multiple instances, need to map to binary qubits is difficult.

The other things why so difficult: how dense , cost function being expressed as sum of products of bits, sum of pairwise products bits, like quadratic optimisation problem - quantum annealing: can just solve quadratic optimisation problem. There or four body (bits) problems, can be mapped to qudrtica function with 2 in qubits, but this causes overhead that increase asymptocatily like n to n^2 bits.

Connectivity of graph - some optimisation have direst neuhbors on grid - turns out to be csolved easily classical. If local nature of connections. So you need problems, where the graph is not sparse and non local. If you use annealing: map problem hmailtoian to hardware graph, causes tons of overhead and qubits to get that plane. If you have digital approach like for qAOA or fault tolerant, increases number of swap gates (but not bits).

contrast: quantum chemistry simulation: true quantum nature not easy to solve on 16 or 18 bits, solution is single bit string (expressing). Optimisation: finding the solution is easy, but you can’t express it easily n^80, you can’t write down the quantum wave function. You need 10^100 more bits than you would for other applications in simulation, like quantum chemist than on optimisation.

Annealer some advantages for long range couples, sparse d of graph is increasing: but not practicalto engineer, our devices have this property. Requires number of wires  between wqubits increases with more qubits, which is impracticable.

For error corrected otpimuzation : e don’t have the computers, but what we can prove? The problems are non oracular -

You have to go to astronomical sizes to have advantages in error corrected with quadratic speedups

Introduce energy landscape to cquamtum computer - diagonal hamitlionaina under the cost function, in all forms of quantum optimisation. Every cost function has to have order n term in it (edges), you need high connectivity

Many hard problems are not even graphs, n^3 don’t need any edges, but n^1,5 you can’t have a sparse graphs. N needs to be order of thousands, number of bits, number of edges n^1,5 = 160.000, each of those edges needs at least x seizes. 300.000 edges (with rotations?), QC has 0.995 fidelity: 300.000 gates with that, raise that to power of 300.000 -> going against zero. 10^-220. 5 times n edges, we have 5000 edges, 2 seizes er edge=. 10..000 gates. I get 10^-22 fidelity. Number of circuit repetition required before seeing one error is over that, 5^25, repeated QC so many times, not going to happen.

NISQ: is not scalable to 2000 qubits, not simulation, not optimisation, you have to have error correction. 50 -150 bits would be classically intractable, 0.995^300 = 0.2 fidelity, that sounds reasonable. Or 0.995^3000, we used a hundred bits, plus edges 2 seizes per edge, 2000 gates as result with 0.995 dfeiltiy, 4* 10^-5 = 22.000 samples in 2 seconds, something with hundred bit is ok, with thousands of bits is incredible.

Runtime of exact classuical alrogrhtm with heuristics algorithms compare - wrong!!

Maybe future for optimisation, but better approach apply QAOA to new problem is not going change speedup, you really need a close match between structure of algorithm and problem.

###### *Computability Theory*

**Formal language**

In logic, mathematics, computer science, and linguistics, a [formal language](https://en.m.wikipedia.org/wiki/Formal_language) consists of words whose letters are taken from an alphabet and are well-formed according to a specific set of rules.

![ggg](https://upload.wikimedia.org/wikipedia/commons/thumb/a/aa/Syntax_tree.svg/280px-Syntax_tree.svg.png)

*Structure of the syntactically well-formed, although nonsensical, English sentence, "Colorless green ideas sleep furiously" (historical example from Chomsky 1957)*

**Chomsky-Hierarchy**

Teilmengenbeziehung der Sprachklassen in der [Chomsky-Hierarchie](https://de.m.wikipedia.org/wiki/Chomsky-Hierarchie), sie auch engl [CHOMSKY HIERARCHY](https://en.m.wikipedia.org/wiki/Chomsky_hierarchy)

![ggg](https://upload.wikimedia.org/wikipedia/commons/thumb/a/a6/Chomsky-Hierarchie.svg/320px-Chomsky-Hierarchie.svg.png)

**Automata theory**

[Automata theory](https://en.m.wikipedia.org/wiki/Automata_theory) is the study of abstract machines and automata, as well as the computational problems that can be solved using them. It is a theory in theoretical computer science with close connections to mathematical logic. Automata theory is closely related to formal language theory.

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_1648.jpg)

1. [Combinational Logic](https://en.m.wikipedia.org/wiki/Combinational_logic)

2. [Finite State Machine](https://en.m.wikipedia.org/wiki/Finite-state_machine), z.B. [Deterministic Finite Automaton](https://en.m.wikipedia.org/wiki/Deterministic_finite_automaton) and [Quantum Finite Automaton](https://en.m.wikipedia.org/wiki/Quantum_finite_automaton). Finite automata: pattern recognition, regular expressions.

3. [Pushdown Automaton](https://en.m.wikipedia.org/wiki/Pushdown_automaton). Pushdown automata: context free, like parsing.

4. [Turing Machine](https://en.m.wikipedia.org/wiki/Turing_machine) and [Quantum Turing Machine](https://en.m.wikipedia.org/wiki/Quantum_Turing_machine). Turing machines: recursively and enumerable, general computation.

Video: [Automata Theory](https://www.youtube.com/watch?v=YrEnAg2gQoU&t=7125)

A [quantum Turing machine (QTM)](https://en.m.wikipedia.org/wiki/Quantum_Turing_machine) or universal quantum computer is an abstract machine used to model the effects of a quantum computer. However, the computationally equivalent [quantum circuit](https://en.m.wikipedia.org/wiki/Quantum_circuit) is a more common model.

**Types of programs**
1. [Undecidable function](https://en.m.wikipedia.org/wiki/Undecidable_problem) with [List of undecidable problems](https://en.m.wikipedia.org/wiki/List_of_undecidable_problems)
2. [Recursively enumerable function](https://en.m.wikipedia.org/wiki/Recursively_enumerable_language) (for some arguments you put into the function they will stop and give an answer and for others they will go on forever)
3. [Recursive functions](https://en.m.wikipedia.org/wiki/General_recursive_function), not every total recursive function is a primitive recursive function—the most famous example is the Ackermann function.
4. [Primitive Recursive function](https://en.m.wikipedia.org/wiki/Primitive_recursive_function) (incl. every other program that isn’t recursive, like something going througha. Sequence, for loop and nested for loops)

> See also [Chomsky Hierarchy (of Turing machines](https://en.m.wikipedia.org/wiki/Chomsky_hierarchy)
* type recursively enumerable (the most general): busy beaver (don't stop)
* type 0 (recursive) is ackermann (they terminate)
* typ 1: context sensitive. needs RAM, but you can predict how much RAM (turing machines with a predictable and finite amount of RAM requirements)
* type 2: no RAM needed

* See video from computerphile: https://youtu.be/224plb3bCog
- https://en.m.wikipedia.org/wiki/Computably_enumerable_set
- https://en.m.wikipedia.org/wiki/Enumeration_algorithm
- https://en.m.wikipedia.org/wiki/Recursion_(computer_science)
- Video: [The Most Difficult Program to Compute? - Computerphile](https://www.youtube.com/watch?v=i7sm9dzFtEI)
* **Recursion vs Iteration**
  * In simple terms, an iterative function is one that loops to repeat some part of the code, and a recursive function is one that calls itself again to repeat the code.
  * Iteration can be used to repeatedly execute a set of statements without the overhead of function calls and without using stack memory. Iteration is faster and more efficient than recursion. It's easier to optimize iterative codes, and they generally have polynomial time complexity.

**What means Computability?**

* https://en.m.wikipedia.org/wiki/Model_of_computation

* https://en.m.wikipedia.org/wiki/Theory_of_computation

* A computable **number** is a real number that can be calculated to any desired precision by a finite, terminating algorithm. In other words, there is a computer program that can produce the digits of the number, one by one, to any desired level of accuracy. A number is non-computable if there is no such program. This means that there is no way to calculate the digits of the number exactly, no matter how much time or resources you have.

  * A real number is computable if there exists an algorithm that can approximate this real number to any degree of precision. Equivalently, but maybe more intuitively, if there is an algorithm such that if you run it long enough, you can obtain any digit in its decimal expansion.

  * **Surprisingly, almost no real numbers are computable, despite the fact that almost all numbers you ever work with are.**

* A computable **function** requires a finite number of steps to produce the output. Siehe [Berechenbarkeit](https://de.m.wikipedia.org/wiki/Berechenbarkeit)

**Church Turing Thesis**: all algorithms may be thought of as Turing machines. apply to all function. you cannot short cut computation: Halting problem.

> See [Undecidable problem](https://en.m.wikipedia.org/wiki/Undecidable_problem) and [List of undecidable problems](https://en.m.wikipedia.org/wiki/List_of_undecidable_problems)

[Computability_theory](https://en.m.wikipedia.org/wiki/Computability_theory):

* The Busy Beaver function Σ(n) grows faster than any computable function. Hence, it is not computable; only a few values are known

* Here are some of the applications of computability theory:

  * The halting problem: The halting problem is the problem of determining whether a given computer program will halt. The halting problem is undecidable, which means that there is no algorithm that can solve the halting problem.
  * The complexity of problems: Computability theory can be used to study the complexity of problems. For example, we can use computability theory to determine whether a problem is decidable or undecidable.
  * The development of new algorithms: Computability theory can be used to develop new algorithms. For example, we can use computability theory to develop algorithms that can solve decidable problems efficiently.

*See also: [Pfeilschreibweise](https://de.m.wikipedia.org/wiki/Pfeilschreibweise), [Hyper-Operator](https://de.m.wikipedia.org/wiki/Hyper-Operator), [Potenzturm](https://de.m.wikipedia.org/wiki/Potenzturm), , [Long_and_short_scales](https://en.m.wikipedia.org/wiki/Long_and_short_scales)*

**Gödel, Turing, and Church are three of the most important figures in the history of computability theory. Their work has had a profound impact on our understanding of what it means for a function to be computable.**

* Kurt Gödel was a logician who made several important contributions to computability theory. In his 1931 paper, On Formally Undecidable Propositions of Principia Mathematica and Related Systems, Gödel showed that there are certain statements that cannot be proven or disproven within any formal system. This result, known as Gödel's incompleteness theorem, has implications for computability theory because it shows that **there are some functions that cannot be computed by any Turing machine (Halting problem, Collatz conjecture, Turing degrees (halting problem is equivalent to the problem of determining whether two Turing machines are in the same Turing degree.))**

* Alan Turing was a mathematician and computer scientist who is considered to be the father of theoretical computer science. In his 1936 paper, On Computable Numbers, with an Application to the Entscheidungsproblem, Turing introduced the notion of a Turing machine, which is a theoretical model of computation. Turing showed that any function that can be computed by a Turing machine is also computable by a human being using paper and pencil. This result, known as the Church-Turing thesis, is one of the most important foundations of computability theory.

* Alonzo Church was an American mathematician who made several important contributions to computability theory. In his 1936 paper, An Unsolvable Problem of Elementary Number Theory, Church introduced the lambda calculus, which is a formal system for representing functions. Church showed that the lambda calculus is equivalent to the Turing machine model of computation, which means that any function that can be computed by a Turing machine can also be computed in the lambda calculus.

The work of Gödel, Turing, and Church has had a profound impact on our understanding of computability theory. Their results have shown that there are some functions that cannot be computed by any Turing machine, and that any function that can be computed by a Turing machine can also be computed by a human being using paper and pencil.

**Computability Theory vs Computational Complexity Theory**

The main difference between computational complexity theory and computability theory is that **computability theory** is concerned with what can be computed, while **computational complexity theory** is concerned with how efficiently it can be computed.

* **Computability theory** studies the theoretical limits of what can be computed by an idealized computing device, such as a Turing machine. It asks questions such as:
    * What are the fundamental problems that cannot be solved by any computer?
    * What are the problems that can be solved by a computer, but only with an exponential amount of time or space?
    * What are the problems that can be solved by a computer in polynomial time?

* **Computational complexity theory** studies the resources (time, space, etc.) required to solve different computational problems. It asks questions such as:
    * How does the complexity of a problem change as the size of the input increases?
    * What are the most efficient algorithms for solving different problems?
    * How can we design algorithms that are efficient for a wide range of problems?

Computability theory is a foundational discipline in computer science, while computational complexity theory is a more applied discipline. Computability theory provides the theoretical foundation for computational complexity theory, and many of the problems studied in computational complexity theory are motivated by questions in computability theory.

| Feature | Computational Complexity Theory | Computability Theory |
|---|---|---|
| Focus | How efficiently can a problem be solved | What problems can be solved |
| Resource | Time and space | Time and space |
| Model of computation | Turing machine | Turing machine, finite automata, etc. |
| Problems studied | NP-complete problems, NP-hard problems, P problems, etc. | Halting problem, Entscheidungsproblem, etc. |


**Flow of Compilation in Programming Languages**

* Erste Generation: Maschinencode (binary befehle)
* Zweite Generation (low level programming language): Assembly Language (mit hexadezimal) wie Intel x86
    * https://de.m.wikibooks.org/wiki/Assembler-Programmierung_für_x86-Prozessoren/_Druckversion
    * https://de.m.wikipedia.org/wiki/Hexadezimalsystem
* Dritte generation (high-mid level progamming language): Fortran, C, C++ (Is Fortran higher level language than C? - *Fortran and C are both programming languages. Fortran is typically considered a higher-level language than C because it is more specialized for numerical computation and scientific computing. On the other hand, C is considered a lower-level language because it is more general-purpose and is often used for systems programming. However, both languages have their own strengths and weaknesses, and the choice of which one to use depends on the specific task or project that you are working on.*)
* Scripted / Interpreted languages: Pythin, java, shell, perl
* Heute: viele Programmiersprachen basieren auf „lambda calculus“ (Church)

https://medium.com/beereal/the-real-man-codes-in-assembler-d9dfd5eb6db6

###### *Physical Boundaries of Computation*

**Physical Boundaries of Computation**

*Limits from **Physics** (thermodynamics, quantum mechanics), from **Computer Science** (Computability theory, Complexity theory, Information science) and from **Mathematics** (number theory). Limits are derived from simple questions:*

1. <font color="blue">What is the max limit of information capacity?</font> (per given volume and in the observable universe) - Beckenstein Bound 10$^{43}$ bits per kg
2. <font color="blue">What is the max physical speed limit of computation?</font> - Bremermann's Limit 10$^{50}$ operations per second (speed of light and Planck length > fundamental limit to how fast information can be processed, even in a perfect computer)
3. <font color="blue">What is the physical limit of computation (max number of operations) of the universe in its entire lifetime?</font> - 10$^{229}$ operations (if all matter in the observable would turn into a black hole computer)
4. <font color="blue">What is the max physical time limit of computation?</font> - Poincare Recurrence time (universe resets itself, e.g. before finishing Graham's number)
5. <font color="blue">What is the max physical resolution limit of the universe?</font> - 10$^{185}$ Planck volume. Anything bigger than this number cannot be explained in physical terms.
6. <font color="blue">Can I find another me of myself in this universe?</font> - No! $10^{{10}^{70}}$ is the number of all possible quantum states a person can occupy (roughly a 1 m$^3$ of space) - the same arrangement of atoms that makes you. If you would walk $10^{{10}^{70}}$ meters, you would start to see repetitions of yourself. But the max number of protons of the observable universe is only $10^{80}$ [Eddington number](https://en.m.wikipedia.org/wiki/Eddington_number) and the max resolution is only 10$^{185}$ (Planck volume).

See also: [Limits of computation (wikipedia)](https://en.m.wikipedia.org/wiki/Limits_of_computation)

**Selection of physical limits relevant to computation:**
* <font color="blue">$10^{-44}$ seconds max speed to write a single symbol - [Planck time](https://de.m.wikipedia.org/wiki/Planck-Zeit). Important to know the limits how to compute large numbers: </font>

  * if you can write a single symbol only at this time max, then it would take you more time to write down **Graham's number** than one period within **Poincare's recurrence time** = when the universe would reset itself before you finished writing Graham's number

  * to write down the **Googol number** you need $10^{56}$ seconds = $10^{48}$ years - Have we enough time to write that down? - Depends on nature of dark energy. In $10^{48}$ years we have the **era of Black Hole dominance** (all matter disappeared and we live in a supermassive black hole or is all matter is unimaginable far apart) [Video](https://www.youtube.com/watch?v=X3l0fPHZja8)

* <font color="blue">$10^{-21}$ x 2.9 Joule: [Landauer’s principle](https://en.m.wikipedia.org/wiki/Landauer%27s_principle). Min amount of heat per erased bit that is dissipated when information is destroyed [Source](https://physicsworld.com/a/wiping-data-will-cost-you-energy). See also [Entropy in thermodynamics and information theory](https://en.m.wikipedia.org/wiki/Entropy_in_thermodynamics_and_information_theory)</font>

* $10^{11}$ years (100 billion) other galaxies than Andromeda and Milky way are outside of the visible universe

* $10^{12}$ years (1 trillion) the galaxies will be depleted of gas clouds, and thus the formation of new stars will be impossible (all hydrogen in stars cores are exhausted)


* $12^{12}$ years (1,2 trillion) all stars in the universe will have exhausted (no more stars). [Source](https://www.youtube.com/watch?v=dsWfGzxjs0w), remaining only white dwarfs, neutron stars and black holes.

* <font color="blue">$10^{15}$ (1 quadrillion) – all planets are detached from yheir solar systems

* <font color="blue">$10^{16}$ bit is the total amount of information a typical human observer can possibly absorb during his lifetime [arXiv:0910.1589](https://arxiv.org/abs/0910.1589)</font>


* <font color="blue">$10^{17}$ (100 quadrillion) – The number of seconds since the Big Bang (important to estimate max limit of computation since beginning of universe)

* <font color="blue">$10^{18}$ max operations per second in human brain (and floating point operations, flops in 2020)</font>

* $10^{22}$ number of stars in the universe

* $10^{23}$ x 6,022 - Avogadro Zahl. Multipliziert mit mol$^{-1}$ ergibt die [Avogadro Konstante](https://de.m.wikipedia.org/wiki/Avogadro-Konstante) - die Anzahl der Teilchen, die in einem Mol eines Stoffes enthalten sind (602 Trilliarden Teilchen pro Mol)

* <font color="blue">$10^{30}$ years: all remnant of stars will have fallen in the central galactic supermassive black hole

* * <font color="blue">$10^{30}$ x 3 years: if also Protons decay and all atmic nuclei are decayed. Black hole era of the universe starts. Black holes are the only celestial objects in the entire universe.

* <font color="blue">$10^{43}$ = [Beckenstein bound](https://en.m.wikipedia.org/wiki/Bekenstein_bound): maximal amount of information that can be contained within a given volume. Genauer: Using mass–energy equivalence, the informational limit may be reformulated as follows where M is the mass (in kg), and R is the radius (in meter) of the system ${\displaystyle H\leq {\frac {2\pi cRM}{\hbar \ln 2}}\approx 2.5769082\times 10^{43}\ {\frac {\text{bit}}{{\text{kg}}\cdot {\text{m}}}}\cdot M\cdot R,}$. You can also consider this as $10^{43}$ Hertz: a computer that operated at these operations per second would use so much energy that it would simply collapse to a black hole - Moore’s law: ultimate limits imposed by quantum gravity (min 55:00 [video](https://www.youtube.com/watch?v=uX5t8EivCaM&t=2492s))</font>

* <font color="blue">$10^{50}$ bits per second per kilogram x 1.3563925 = [Bremermann limits](https://en.m.wikipedia.org/wiki/Bremermann%27s_limit) - Max rate of data processing by isolatad material system. Physical limit of computation. It is derived from Einstein's mass-energy equivalency and the Heisenberg uncertainty principle, and is c2/h.</font>

  * A computer at the size of earth and operating at Bremermann's limits could do $10^{74}$ computations per second. It could break a 128 bit cryptographic key in less than $10^{-36}$ of a second and a 256 bit key in 2 minutes. However for 512 bit key working this Bremermann's computational speed limit would ta e$10^{72}$ years. [Source at min 2:30](https://www.youtube.com/watch?v=ZDfaXJRtOoM). Das Universum wird aber vrs “nur” noch max Trillionen (10^12) oder Quadrillionen (10^15) Jahre existieren.

* <font color="blue">$10^{50}$ x 5 operations per second. The "ultimate laptop": 1 kg black hole with width = 3 x $10^{27}$ meters it would perform 5 x $10^{50}$ operations /sec, but has only a lifetime = $10^{-22}$ sec (due to Hawking radiation). Source: [Seth Lloyd: Black hole computer](https://www.scientificamerican.com/article/black-hole-computers-2007-04/). It's also the **final limit of computation is at Heisenberg uncertainty** at $10^{50}$ operations per second (flops, not clock speed) [Source](https://www.youtube.com/watch?v=jv2H9fp9dT8) even when accounting for a Black Hole</font>

* * <font color="blue">$10^{68}$ years: black hole of the size of the sun will have decayed due to Hawking radiation

* $10^{80}$ [Eddington number](https://en.m.wikipedia.org/wiki/Eddington_number): Number of protons in the observable universe. genauer: 1.57 × $10^{79}$

* <font color="blue">$10^{80}$ x ∼6: number of bits of information stored in all the matter particles of the observable universe (where each particle in the observable universe contains 1.509 bits of information) [arxiv:0110141](https://arxiv.org/abs/quant-ph/0110141) and [arxiv:2112.04473](https://arxiv.org/abs/2112.04473)</font>

* $10^{90}$: Fill the universe with grains of sand (.5mm in diameter)

* <font color="blue">$10^{90}$ - max number of bits the universe can have processed [arxiv:0110141](https://arxiv.org/abs/quant-ph/0110141), based on the amount of information the Universe can register and the number of elementary operations that it can have performed over its history</font>

* <font color="blue">$10^{90}$ operations per second: if all matter in the observable would turned into a black hole computer it could perform $10^{90}$ operations per second but it has a life time of 2.8 x $10^{139}$ seconds before hawking radiation cause it to evaporate. In that time it could perform 2.8 x $10^{229}$ operations. Source: [Seth Lloyd: Black hole computer](https://www.scientificamerican.com/article/black-hole-computers-2007-04/)</font>


* $10^{106}$ - $10^{108}$ years it takes for Black holes to evaporate due to Hawking radiation. Source: [The Crazy Future If Protons Don't Decay](https://www.youtube.com/watch?v=5XuBIyGqE1w). The universe enters the dark era.All physical objects have decayed into subatomic particles. If protons don't decay, look at $10^{1500}$ years.


* $10^{100}$: [Googol](https://en.wikipedia.org/wiki/Googol), from American mathematician Edward Kasner in 1938

* $10^{113}$  The number of hydrogen atoms it would take to pack the universe full of them.

* $10^{116}$ x 1.57: possible ways of arrangements a 6x6x6 rubik's cube can have

* <font color="blue">$10^{120}$ - max number of operations the universe can have performed [arxiv:0110141](https://arxiv.org/abs/quant-ph/0110141), based on the amount of information the Universe can register and the number of elementary operations that it can have performed over its history</font>

* <font color="blue">$10^{120}$ max number of bits of data that could be computes in the amount of time that has elapsed so far in the universe based on the maximum entropy of the universe, the speed of light and the minimum time taken to move information across the Planck length. Anything that requires more than this amount of data cannot have been computed yet [Source, min 2](https://youtu.be/nJObMJLweCs) - Limit on the computational power of the universe (and why Laplace demon cannot exist)</font>

* $10^{122}$ The number of protons you could fit in the universe (incl, non-observable universe??) -> for this I need a source

* $10^{183}$ number of Planck length' ($10^{-35}$ meters) cubes in the observable universe

* <font color="blue">$10^{185}$ x 4: Without being able to go smaller, we’ve reached the largest number where the physical world can be visualize: take a Planck length (10^(-35) meter) and fill it with universe - largest number where the physical world can be used to visualize it. **Anything bigger than this number cannot be explained in physical terms.**</font>

* $10^{200}$ pieces to cut a circle and fill with it a square - from circle to square: [Tarski's circle-squaring problem](https://en.m.wikipedia.org/wiki/Tarski%27s_circle-squaring_problem). Achtung: From square to circle: [Squaring the circle](https://en.m.wikipedia.org/wiki/Squaring_the_circle) has been proven to be impossible.

* <font color="blue">$10^{229}$ x 2.8: max number of operations if all matter in the observable would turned into a black hole computer Source: [Seth Lloyd: Black hole computer](https://www.scientificamerican.com/article/black-hole-computers-2007-04/)</font>

* $10^{1500}$ years: In case proton decay is not possible, all the particles will have fused together to form iron-56 isotopes, and thus creating something called 'Iron stars'.

* 10$^{272.000}$ number of possible universes in string theory [video](https://youtu.be/k_TEoUF12Yk)

* $10^{{10}^{16}}$: Number of universes that a human observer may distinguish = number of different configurations a typical human brain can have [arXiv:0910.1589](https://arxiv.org/abs/0910.1589)

* $10^{10^{40}}$ where [Merten‘s Conjecture](https://en.m.wikipedia.org/wiki/Mertens_conjecture) can be disproven

* $10^{10^{40}}$ years: all particles will have collapes into black holes (which would evaporate almost instantaneously due to timescale). Now the universe is an absolute void with nothing inside of it. Now, the Universe has reached its final energy state, its maximum entropic value.

* $10^{10^{40}}$ years: [Boltzmann brain](https://de.m.wikipedia.org/wiki/Boltzmann-Gehirn), a self-aware entity that appeared due to random quantum fluctuations."

* <font color="blue">$10^{{10}^{70}}$ - All possible quantum states a person can occupy (roughly a meter ^3 of space). If you would walk $10^{{10}^{70}}$ meters, you would start to see repetitions of yourself.[Source](https://youtu.be/8GEebx72-qs?t=287)</font>

* $10^{{10}^{76}}$ years: iron stars will slowly collapse into black holes. Source: [The Crazy Future If Protons Don't Decay](https://www.youtube.com/watch?v=5XuBIyGqE1w)

* $10^{{10}^{100}}$ or $10^{Googol}$ - [Googolplex](https://en.m.wikipedia.org/wiki/Googolplex): you cannot write this number in the (observable) universe, since there is not enough space: it has more digits than there are atoms in the observable universe. <font color="blue">**From here start numbers that are incomprehensible. If i walk a googoplex far (the universe is as big as a googolplex meters across), I would see repetitions of myself** (aka same arrangements of atoms like me).</font> if i go further i would see the entire observable universe repeating (size of the universe: $(10^{26})^3$ meters. <font color="blue">**This means there is probably not another me in the universe**.</font> but if you live in a universe that is a googolplex across, then you would by chance run into the same arrangements of atoms that matches you. [numberphile](https://www.youtube.com/watch?v=8GEebx72-qs&t=0s)

* $10^{{10}^{{10}^{7}}}$: Number of possible universes, but assume that we are not limited as observers to distinguish more universes. $10^{{10}^{16}}$ is number of universes that a human observer may distinguish [arXiv:0910.1589](https://arxiv.org/abs/0910.1589)

* $10^{{10}^{{10}^{34}}}$ bzw. $e^{{e}^{{e}^{79}}}$: untere Schranke der [Skewes Number](https://de.m.wikipedia.org/wiki/Skewes-Zahl)


* $10^{{10}^{{10}^{56}}}$ years: random quantum fluctuations and quantum tunneling is predicted to give birth to another universe [Source](https://www.youtube.com/watch?v=dsWfGzxjs0w)

* $10^{{10}^{{10}^{76}}}$ years: next checkpoint for a new universe to give birth via a big bang (if protons don't decay), if we are looking at the probability of quantum fluctuations. Source: [The Crazy Future If Protons Don't Decay](https://www.youtube.com/watch?v=5XuBIyGqE1w)

* $10^{{10}^{{10}^{100}}}$ - Googolplexian. Number 10 raised to the power of one Googolplex. 10^1000000..... - line is probably trillions of light years long

* $10^{{10}^{{10}^{1000}}}$: untere Schranke der [Skewes Number](https://de.m.wikipedia.org/wiki/Skewes-Zahl) unter Nicht-Annahme der Riemann Hypothese. Unter Annahme der Riemann Hypothese: $10^{{10}^{{10}^{961}}}$ bzw. $e^{{e}^{{e}^{7703}}}$

* <font color="blue">$10^{{10}^{{10}^{{10}^{{10}^{1.1}}}}}$ **Poincare recurrence time**: Time until a system resets itself (particle of gas go back in the corner, because phase space is finite). you can apply that to the whole universe. Largest finite time ever been calculated by a physicist in a published paper: [arxiv:9411193](https://arxiv.org/abs/hep-th/9411193). He calculated [Poincare recurrence time](https://en.m.wikipedia.org/wiki/Poincar%C3%A9_recurrence_theorem) (for a certain type of universe with a certain cosmological number). [Video](https://www.youtube.com/watch?v=1GCf29FPM4k).</font>


<font color="red">**Here we cannot simply write down the numbers anymore $\downarrow \downarrow$**

* $g_{64}$ [Grahams Zahl](https://de.m.wikipedia.org/wiki/Grahams_Zahl) $g_{64} = 3 \uparrow \underbrace{\uparrow \ldots \uparrow}_{g_{63}} \uparrow 3 = 3(G63\uparrow)3$ = the biggest number used constructively. If you count that number in your head, the total amount of information would turn your head into a black hole. Graham's number is **not** transfinite. But it is an astronomically large number that is defined using a recursive formula. The formula is so large that it cannot be written out in standard mathematical notation.

* tree(3) [Kruskal's tree theorem](https://en.m.wikipedia.org/wiki/Kruskal%27s_tree_theorem)

* $D^{(5)}(99)$ - Loaders number. It's the fifth iteration of a certain function  D  on the value 99. TREE(3) is indistinguishable from zero compared to Loader's number

* SCG(13) and SSCG(3) - [Friedman's SSCG function](https://en.m.wikipedia.org/wiki/Friedman's_SSCG_function). SCG(13) is computable, whereas Rayo's number is uncomputable. Rayo's number >> SCG(13).

<font color="red">**$\uparrow \uparrow$ Here separates the computable from the uncomputable $\downarrow \downarrow$**

* [Busy Beaver](https://en.m.wikipedia.org/wiki/Busy_beaver): endliche, aber im Allgemeinen nicht berechenbare Funktion. The Busy Beaver function Σ(n) grows faster than any computable function. Hence, it is not computable; only a few values are known. some mathematical system loose the ability to prove its values beyond a point. ps: Rayo's number is much bigger than the Beaver - but the Busy Beaver isn't relevant to Rayo. Rayo is fundamentally about truth rather than provability, and so the relevant "logical obstacle" is Tarski's undefinability theorem rather than the incomputability of the halting problem

* Rayo($10^{100}$) - [Rayo's number](https://en.m.wikipedia.org/wiki/Rayo). Rayo's number is the smallest number that cannot be expressed anymore. Grahams number would look like a 0 next to rayos number. Rayo's number is so large that it is impossible to describe in any finite amount of time or space. Rayo's number is a transfinite number.

* [Chaitin's constant](https://en.m.wikipedia.org/wiki/Chaitin%27s_constant) or halting probability is a real number that, informally speaking, represents the probability that a randomly constructed program will halt. Chaitin's number is transfinite. It is an uncomputable number that measures the randomness of a computer program. The higher the Chaitin's number, the more random the program is. Chaitin's number is uncomputable (because it is impossible to determine the halting probability of a Turing machine.)

* Fish number 7 belongs to the family of the fish numbers, which were defined by a Japanese googologist. **Fish numbers were often classified as “above Rayo's number”**, and they're pretty incomprehensible.

See Wiki article [Limits of computation](https://en.m.wikipedia.org/wiki/Limits_of_computation)

Paper: Seth Lloyd (2000) - [Ultimate physical limits to computation](https://arxiv.org/abs/quant-ph/9908043)

Paper: Seth Lloyd (2001) - [Computational capacity of the universe](https://arxiv.org/abs/quant-ph/0110141)

Paper: Scott Aaronson (2005) - [NP-complete Problems and Physical Reality](https://arxiv.org/abs/quant-ph/0502072)

Paper: [Estimation of the information contained in the visible matter of the universe](https://arxiv.org/abs/2112.04473)

Video: [The Boundary of Computation](https://www.youtube.com/watch?v=kmAc1nDizu0&list=WL&index=7)

Video: [The Limits of Computation](https://www.youtube.com/watch?v=ZDfaXJRtOoM)

[Amdahl's law](https://en.m.wikipedia.org/wiki/Amdahl%27s_law) is often used in parallel computing to predict the theoretical speedup when using multiple processors

[From 1,000,000 to Graham’s Number](https://waitbutwhy.com/2014/11/1000000-grahams-number.html)

The physical limit of computation that the universe can do in its entire time of existence is not yet known with certainty. However, there are a number of theoretical limits that have been proposed.

One limit is the **Landauer limit**, which states that the minimum energy required to perform a single logical operation is equal to the Boltzmann constant times the temperature of the system. This limit implies that the maximum number of computations that can be performed in a given amount of time is limited by the total energy available.

Another limit is the **Bremermann limit**, which states that the maximum computational speed of a self-contained system is limited by the speed of light and the Planck length. This limit implies that there is a fundamental limit to how fast information can be processed, even in a perfect computer.

The [**Margolus-Levitin theorem**](https://de.m.wikipedia.org/wiki/Margolus-Levitin-Theorem) states that the maximum computational speed per unit of energy is limited by the Planck constant. This limit implies that there is a fundamental limit to how much computation can be performed with a given amount of energy.

Combining these limits, it is possible to estimate that the maximum amount of computation that can be performed in the entire lifetime of the universe is on the order of **10^229 operations**. This is a staggeringly large number, but it is still finite. It is possible that future advances in physics will allow us to exceed this limit, but for now, it seems to be the best estimate of the physical limit of computation.

It is important to note that these limits are only theoretical. In practice, it may be impossible to build a computer that can actually achieve these limits. The laws of physics may also change over time, which could affect the limits of computation. However, these limits provide a useful starting point for understanding the potential of computation in the universe.

*More details: Fast-growing hierarchy: Largest Numbers & Fastest Growing Functions*

**g64 number [Grahams Zahl](https://de.m.wikipedia.org/wiki/Grahams_Zahl)**

* **Graham's number is the biggest number used constructively** = used in a mathematical proof, rather than just being a theoretical concept. There is a specific mathematical problem that can be solved by using Graham's number.


> $3 \uparrow \uparrow 3=3 \uparrow 3 \uparrow 3 = 3^{3^3}=7,625,597,484,987$


> $3 \uparrow \uparrow \uparrow 3=3 \uparrow \uparrow 3 \uparrow \uparrow 3$ = $3^{3^{3^{(..)^3}}}$ - This means the number 3 is 7,625,597,484,987 (7.6 trillion) times exponentiated!

> $g_1 = 3 \uparrow \uparrow \uparrow \uparrow 3 = 3 \uparrow \uparrow \uparrow 3 \uparrow \uparrow \uparrow 3$

> $g_2 = 3 \uparrow \underbrace{\uparrow \ldots \uparrow}_{g_1} \uparrow 3  = 3(G1\uparrow)3$

> $g_3 = 3 \uparrow \underbrace{\uparrow \ldots \uparrow}_{g_2} \uparrow 3 $

> ....

> $g_{64} = 3 \uparrow \underbrace{\uparrow \ldots \uparrow}_{g_{63}} \uparrow 3 = 3(G63\uparrow)3$ = [Grahams Zahl](https://de.m.wikipedia.org/wiki/Grahams_Zahl) the biggest number used constructively


* you can't say how many digits it has. If you imagine Graham's number in your head, then your head would collaps into a black hole. It is estimated that the observable universe is not large enough to contain an ordinary digital representation of Graham's number.

* Upper bound: used in combinatorics, graph theory, as an upper bound for coloring graphs that are linked to higher dimensional cubes: for ["Ramsey number for hypergraphs"](https://en.m.wikipedia.org/wiki/Ramsey%27s_theorem) (This problem is concerned with the number of edges that need to be added to a hypergraph in order to make it so that any two distinct subsets of the hypergraph have either the same number of edges between them, or the opposite number of edges between them.). Graham's number is the max number for this to be true (and 6 or 11 is the lowest)

*Pending zur Einordnung: https://de.m.wikipedia.org/wiki/K-d-Baum*

**tree(3)** [Kruskal's tree theorem](https://en.m.wikipedia.org/wiki/Kruskal%27s_tree_theorem)

* tree theorems are larger numbers for mathematical proofs than graham's number

https://en.m.wikipedia.org/wiki/Kruskal%27s_tree_theorem

https://towardsdatascience.com/how-big-is-the-number-tree-3-61b901a29a2c

- Inf-embedabble
- Ackermann numbers as lower bound
- Proof theory: kruskals tree theory
    - Well quasi ordering
    - Poincare recurrence, universe will reset itself due to entropy (limited number of steps) or proof doesn‘t fit in our universe
    - Ordinals, trans-finite arithmetic
- Video 1: https://www.youtube.com/watch?v=3P6DWAwwViU
- Video 2: https://www.youtube.com/watch?v=IihcNa9YAPk


**Loaders number** $D^{(5)}(99)$

* It's the fifth iteration of a certain function  D  on the value 99

* Code was limited to 512 characters, otherwise it would get bigger

* https://googology.fandom.com/wiki/Loader%27s_number

* The interesting point is that  D , the function, is a very fast-growing function with a particularly compact representation, and it’s valuable to compare its growth rate with that of other computable functions.

* I’m not sure much is known about the place of  D  in the fast-growing hierarchy. The Googology entry seems to indicate that, for example, finite promise games produce a faster-growing computable function.

* **TREE(3) is indistinguishable from zero compared to Loader's number[1]**, which basically takes every bit pattern up to some n and expresses this as a program in the Calculus of Constructions. This system is a bit weaker than being Turing complete, but the programs do always terminate (this makes the number computable compared with, say, the Busy Beaver number which does a similar thing with Turing complete programs).
It also has the geek cred of being represented by an obfuscated C program (the unobfuscated verson is also available[2]).
[1] http://googology.wikia.com/wiki/Loader%27s_number
[2] https://github.com/rcls/busy


**SCG(13) and SSCG(3)** [Friedman's SSCG function](https://en.m.wikipedia.org/wiki/Friedman's_SSCG_function)

* https://en.m.wikipedia.org/wiki/Friedman%27s_SSCG_function

* https://en.m.wikipedia.org/wiki/Friedman's_SSCG_function

* The SSCG sequence begins slower than SCG, SSCG(0) = 2, SSCG(1) = 5, but then grows rapidly.

* This number is known for being able to surpass TREE(3) and is defined with SSCG. It is praised for being much larger than TREE(TREE(TREE(TREE… (over TREE(3) times …TREE(TREE(TREE(TREE(3)…).

* SSCG(3) is much larger than both TREE(3)

* SCG(13) is computable, whereas Rayo's number is uncomputable. From here we can already say Rayo's number >> SCG(13). For large numbers beyond g(64), we can only use boundaries to define them, hence the uncertainty.

* Adam P. Goucher claims there is no qualitative difference between the asymptotic growth rates of SSCG and SCG. He writes "It's clear that SCG(n) ≥ SSCG(n), but I can also prove SSCG(4n + 3) ≥ SCG(n)."


* SCG(13) and SSCG(3) - [Friedman's SSCG function](https://en.m.wikipedia.org/wiki/Friedman's_SSCG_function). SCG(13) is computable, whereas Rayo's number is uncomputable. Rayo's number >> SCG(13).


<font color="red">**$\uparrow \uparrow$ Here separates the computable from the uncomputable $\downarrow \downarrow$**

**Rayo(10^100)** [Rayo's number](https://en.m.wikipedia.org/wiki/Rayo)

https://www.youtube.com/watch?v=X3l0fPHZja8

* Rayo's number is the smallest number that cannot be expressed anymore. even grahams number would look like a 0 next to rayos number.

* Rayo's number is defined as the smallest number that is larger than any number that can be named by an expression in the language of first-order set theory with a googol symbols or less. This means that Rayo's number is much larger than a Gogol.

* Compared to Busy beaver: Rayo is way way way way WAY bigger than the Beaver - but the Busy Beaver just isn't relevant to Rayo.) Rayo is fundamentally about truth rather than provability, and so the relevant "logical obstacle" is Tarski's undefinability theorem rather than the incomputability of the halting problem [Source](https://math.stackexchange.com/questions/3910468/a-confusion-regarding-rayos-number-and-busy-beaver-function)


* Rayo($10^{100}$) - Rayo's number](https://en.m.wikipedia.org/wiki/Rayo). Rayo's number is the smallest number that cannot be expressed anymore. Grahams number would look like a 0 next to rayos number.

**Chaitin's constant**

* [Chaitin's constant](https://en.m.wikipedia.org/wiki/Chaitin%27s_constant) or halting probability is a real number that, informally speaking, represents the probability that a randomly constructed program will halt.

* Each halting probability is a normal and transcendental real number that **is not computable**, which means that there is no algorithm to compute its digits. Each halting probability is [Martin-Löf random](https://en.m.wikipedia.org/wiki/Algorithmically_random_sequence), meaning there is not even any algorithm which can reliably guess its digits.

**Fish number 7**

* Fish number 7 belongs to the family of the fish numbers, which were defined by a Japanese googologist. **Fish numbers were often classified as “above Rayo's number”**, and they're pretty incomprehensible.

* https://googology.fandom.com/wiki/Fish_number_7

**Merten‘s Conjecture** $10^{10^{40}}$

* [https://en.m.wikipedia.org/wiki/Mertens_conjecture](Merten‘s Conjecture)
* is disproven: at some point the sum will surpass the square root of the number
* If it had been true, that it would proof the Riemann hypothesis
* But if we knew it, we cannot write it down, because **we would need more atoms than exist in the universe to write it down**
    * PRIZE FOR SOLVING RIEMANN HYPOTHESIS $ 10€
    * STARS IN THE UNIVERSE 10^22
    * ATOMS IN THE UNIVERSE 10^80
    * MERTENS CONJECTURE FAILS $10^{10^{40}}$
* We have no way to describe the first that it happens, but we know it exists
* Video: https://youtu.be/uvMGZb0Suyc


**Busy Beaver - Separating the computable from the uncomputable**

* Difference between (in)finite value and (in)finite instruction: ZFC can't pin down the precise value of even BB(7918). However, ZFC can define the Busy Beaver function itself, in less than, say, a billion symbols.

* Die [Fleißiger-Biber-Funktion bzw. Radó-Funktion Σ](https://de.m.wikipedia.org/wiki/Flei%C3%9Figer_Biber) ist in der theoretischen Informatik ein Standardbeispiel für eine **endliche, aber im Allgemeinen nicht berechenbare Funktion**

* **Open Questions**: Open Questions BB($10^{100}$) > Tree($10^{100}$) ??.

* Die Rado-Funktion Σ ist nicht Turingmaschinen-berechenbar. Obwohl es sehr viele nicht-berechenbare Funktionen gibt - sogar mehr als berechenbare -, muss man sich doch Außergewöhnliches einfallen lassen, um eine nicht-berechenbare Funktionen möglichst konkret zu beschreiben. [Source](https://www.inf-schule.de/algorithmen/berechenbarkeit/grenzenderberechenbarkeit/station_fleissigebiber)

https://en.m.wikipedia.org/wiki/Busy_beaver

$\sum(n)$ : The Busy Beaver function
* Consider all n-state Turing machines.
* Run each on a tape of all 0's.
* Of all machines that halted,
  * $\sum(n)$ = max count of 1's

**The Busy Beaver function Σ(n) grows faster than any computable function. Hence, it is not computable; only a few values are known**

* some mathematical system loose the ability to prove its values beyond a point

$\Sigma(n)$ : The [Busy Beaver](https://en.m.wikipedia.org/wiki/Busy_beaver) function
* Consider all n-state Turing machines.
* Run each on a tape of all 0's.
* Of all machines that halted,
  * $\Sigma(n)$ = max count of 1's

> **$\Sigma(n)$ is not a computable function!** (for all n. but for a specific n it's computable)

* A computable function requires a finite number of steps to produce the output. Siehe [Berechenbarkeit](https://de.m.wikipedia.org/wiki/Berechenbarkeit)

* this function grows faster than any computable function! For: If f(n) : $\mathbb{N}$ => $\mathbb{N}$ for any computable function, then there exists $n_f$ such that: $f(n) < \Sigma(n)$ for all $n$ ≥ $n_f$

* **Means: The busy beaver function beyond some value of $\mathbb{N}$ will grow faster than $\mathbb{N}$**

* There are true statements like: "$\Sigma$(1000) = k" that cannot be proved in our normal axiomatic systems. Mathematics looses the ability to make claims about these numbers.

* the nineteenth busy beaver number is greater than grahams number

![geometry](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_1627.png)

![geometry](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_1628.png)

![geometry](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_1629.png)


Source: https://www.youtube.com/watch?v=kmAc1nDizu0&list=WL&index=7

https://scottaaronson.blog/?p=6673

https://waitbutwhy.com/2014/11/1000000-grahams-number.html

###### *Non-Turing Computing (Hypercomputation)*

Here are some of the hypothetical models of hypercomputation:

* Omega machines: These are machines that can solve the halting problem, which is a problem that is known to be unsolvable by a Turing machine. (from chaitin's (omega) number)
  * *Yet, omega appears to have two features which are normally consid- ered contradictory: it is one of the most informative mathematical numbers imaginable, yet at the same time this information is so com- pressed that it cannot be deciphered. Thus omega appears to be totally structureless and random. In this sense, for omega, total infor- mation and total randomness seem to be “two sides of the same coin”. On a more pragmatic level, it seems impossible here to differen- tiate between order and chaos, or between knowledge and chance. **This gives a taste of what can be expected from any “hyper- computation” beyond universal computability** as defined by Turing.*
  * Source: [How to Acknowledge Hypercomputation?](https://content.wolfram.com/uploads/sites/13/2019/03/18-1-6.pdf)
* Supertasks: These are tasks that can be completed in an infinitesimal amount of time.
* Infinite time Turing machines: These are machines that have infinite time to complete their computations.

For instance, are infinity machines with geometrically squeezed time cycles, such as the ones envisioned by Weyl [7] and others [8-18], physically feasible? Motivated by recent proposals to utilize quantum computation for trespassing the Turing barrier [19-22], these accelerating Turing machines have been intensively discussed [23] among other forms of hypercomputation [24-26].

* [8] A. Grünbaum, Philosophical Problems of Space and Time (Boston Studies in the Philosophy of Science, Vol. 12, 2nd ed., Dordrecht, The Netherlands: D. Reidel Publishing Co., 1973.
* [9] J. F. Thomson, “Tasks and Supertasks,” Analysis, 15, 1954 pp. 1|13.
* [10] P. Benacerraf, “Tasks and Supertasks, and the Modern Eleatics,”
Journal of Philosophy, LIX(24), 1962 pp. 765|784.
* [11] R. Rucker, Infinity and the Mind, Boston: Birkhäuser, 1982; New York:
Bantam Books, 1983 (reprint).
* [12] I.Pitowsky,“ThePhysicalChurch|TuringThesisandPhysicalComputa-
tional Complexity,” Iyyun, 39, 1990 pp. 81|99.
* [13] J. Earman and J. D. Norton, “Forever Is a Day: Supertasks in Pitowsky and Malament-Hogart Spacetimes,” Philosophy of Science, 60(1), 1993 pp. 22|42.
* [14] M. Hogarth, “Predicting the Future in Relativistic Spacetimes,” Studies in History and Philosophy of Modern Physics, 24, 1993 pp. 721|739.
* [15] M. Hogarth, “Non-Turing Computers and Non-Turing Computability,”
Proceedings of the Biennial Meeting of the Philosophy of Science Association (PSA), 1, 1994 pp. 126|138.
* [16] E. W. Beth, The Foundations of Metamathematics, Amsterdam: North- Holland, 1959.
* [17] E. G. K. López-Escobar, “Zeno’s Paradoxes: Pre Gödelian Incomplete- ness,” Yearbook 1991 of the Kurt-Gödel-Society, 4, 1991 pp. 49|63.
* [18] K. Svozil, “The Church|Turing Thesis as a Guiding Principle for Physics,” Unconventional Models of Computation (Discrete Mathemat- ics and Theoretical Computer Science) (C. S. Calude, J. Casti, and M. J. Dinneen, eds.), New York: Springer, 1998 pp. 371|385.
* [19] C. S. Calude and B. Pavlov, “Coins, Quantum Measurements, and Turing’s Barrier,” Quantum Information Processing, 1(1-2), 2002 pp. 107|127. (Mar 1, 2002) arxiv.org/abs/quant-ph/0112087v3.
* [20] V. A. Adamyan, C. S. Calude, and B. S. Pavlov. “**Transcending the Limits of Turing Computability.**” (May 11, 2003) arxiv.org/abs/ quant-ph/0304128v2.
* [21] T. D. Kieu, “Quantum Algorithm for Hilbert’s Tenth Problem,” International Journal of Theoretical Physics, 42, 2003 pp. 1461|1478. (Oct 8, 2003) arxiv.org/abs/quant-ph/0110136v3.
* [22] T. D. Kieu, “Computing the Noncomputable,” Contemporary Physics, 44, 2003 pp. 51|71. (Oct 8, 2003) arxiv.org/abs/quant-ph/0203034v4.
* [23] T. Ord, “The Many Forms of Hypercomputation,” Applied Mathemat- ics and Computation, 178(1), 2006 pp. 143|153. dx.doi.org/10.1016/j.amc.2005.09.076.
* [24] M. Davis, “The Myth of Hypercomputation,” Alan Turing: Life and Legacy of a Great Thinker (C. Teuscher, ed.), Berlin: Springer, 2004 pp. 195|212.
* [25] F. A. Doria and J. F. Costa, “Introduction to the Special Issue on Hypercomputation,” Applied Mathematics and Computation, 178(1), 2006 pp. 1|3. dx.doi.org/10.1016/j.amc.2005.09.065.
* [26] M. Davis, “Why There Is No Such Discipline as Hypercomputation,” Applied Mathematics and Computation, 178, 2006 pp. 4|7. dx.doi.org/10.1016/j.amc.2005.09.066.



* Hypercomputing
* Gravitational Computing
* Adabiatic Quantum Computing


Quantum Gravity Computers: https://arxiv.org/abs/quant-ph/0701019


https://www.pbs.org/wgbh/nova/article/is-there-anything-beyond-quantum-computing/

> Video: [Is the universe a Turing machine?](https://youtu.be/VY6TzB_xH-k)

**Information theory & Entropy Limits**

* **a perfect fredkin gate doesnt cost energy**!! Check for **zero power computing**
    * because: What costs the energy is not the computation itself, but‘s the raising information.
    * Take a look at this original Fredkin/Toffoli paper on the subject, section 5 (**nondissipative computation**): https://cqi.inf.usi.ch/qic/82_Fredkin.pdf
    * Fredkin: 3 in 3 out, reversable?
    * https://medium.com/hackernoon/zero-power-computing-how-to-perform-calculations-using-zero-power-e2b4bfcd4d7e
    * https://www.wikiwand.com/en/Fredkin_gate
    * https://cs.stackexchange.com/questions/38049/why-are-reversible-gates-not-used/38053#38053
    * https://www.science.org/doi/10.1126/sciadv.1501531
* **The Cost of computation**: https://arxiv.org/abs/1905.05669
* Video: [Computing Limit](https://youtu.be/jv2H9fp9dT8)

The universe is not hypercomputational (min 2): https://youtu.be/VV_kArap5TM


**Hypercomputation and non-Turing computations: Limitations of future computers**

* [Hypercomputation](https://en.m.wikipedia.org/wiki/Hypercomputation)

* [Non-Turing Computers and Non-Turing Computability](https://www.jstor.org/stable/193018)

* Church–Kalmár–Kreisel–Turing theses theoretical concerning (necessary) limitations of future computers and of deductive sciences, in view of recent results of classical general relativity theory.  https://arxiv.org/abs/gr-qc/0104023

  * According to a 1992 paper (Hogarth, Mark L. (1992). ["Does general relativity allow an observer to view an eternity in a finite time?"](https://link.springer.com/article/10.1007/BF00682813)), a computer operating in a [Malament–Hogarth spacetime](https://en.m.wikipedia.org/wiki/Malament%E2%80%93Hogarth_spacetime) or in orbit around a rotating black hole could theoretically perform non-Turing computations for an observer inside the black hole.

* Access to a CTC may allow the rapid solution to PSPACE-complete problems, a complexity class which, while Turing-decidable, is generally considered computationally intractable. - [Computability Theory of Closed Timelike Curves](https://www.scottaaronson.com/papers/ctchalt.pdf)

  * there are spacetimes in which the [CTC (closed timelike curves)](https://en.m.wikipedia.org/wiki/Closed_timelike_curve) region can be used for relativistic hypercomputation.

  * [Closed Timelike Curves Make Quantum and Classical Computing Equivalent](https://arxiv.org/abs/0808.2669)

  * While quantum formulations of CTCs have been proposed,[5][6] a strong challenge to them is their ability to freely create entanglement,[7] which quantum theory predicts is impossible. If Deutsch's prescription holds, the existence of these CTCs implies also equivalence of quantum and classical computation (both in PSPACE).[8] If Lloyd's prescription holds, quantum computations would be PP-complete. https://en.m.wikipedia.org/wiki/Closed_timelike_curve

* Black holes as tools for quantum computing by advanced extraterrestrial civilizations: https://arxiv.org/abs/2301.09575

* [Lecture notes for the 28th McGill Invitational Workshop on Computational Complexity](https://arxiv.org/abs/1607.05256)

Penrose: is the brain a quantum gravitational computer? He wants the brain to be exploiting as yet unknown laws of quantum gravity. Which would be uncomputable.

Ads/cft: **nature seems computable** (Penrose disagrees seems): you can build the universe out of NAND gates (1:00:00 https://www.youtube.com/watch?v=nAMjv0NAESM&t=2240s)

https://scottaaronson.blog/?p=213

Videos:
Limits of computation: https://youtu.be/ZDfaXJRtOoM
Physical limits of computation: https://youtu.be/ZVj93b0pa2o
What is the computational power of the universe: https://youtu.be/ROdv1v_YsAw
Limits of computation: https://youtu.be/gV12PS19YL8

**Nature can not solve all np problems (like soap bubbles is a myth)**

scott aaronson: doing the soap experiment:
* you need to repeat the trial several times, it does not always find the best solution
* it seems to find a local optimum mostly
* in cannot scale to large number of nails

In the case of soap bubbles, nature just seems to apply a really fast approximation which is accurate for small cases but breaks down at larger instances (due to actual physical properties... i.e. it is physically impossible to build arbitrarily thin rods to dip into soapy water.)
Many other NP-problems can be modeled as physical systems in which a lowest-energy-state would correspond to a solution to the combinatorial problem. You can find that "DNA computing" solves the clique problem, for instance.

https://groups.google.com/g/comp.theory/c/11lY926-P7M?pli=1

###### *Appendix*

*Moore's Law and Clockspeed*

We have 1000x1000x1000 atoms distance. for quantum tunneling we would need 10x10x10

exponential improvement is the result of 1000 innovations at the same time

[Jim Keller: Moore's Law, Microprocessors, and First Principles | Lex Fridman Podcast #70](https://www.youtube.com/watch?v=Nb2tebYAaOA)

Scott Aaaronson: black hole as computer is physical limit

https://www.maketecheasier.com/why-cpu-clock-speed-isnt-increasing/

**There are a few reasons why CPU clock speed is not increasing anymore.**

* **The physical limits of transistors.** Transistors are the basic building blocks of a CPU, and they can only switch so fast. As transistors get smaller, they also get slower. This is because the distance between the transistors gets smaller, and the electrons have to travel a shorter distance to switch them on and off.
* **The power consumption of CPUs.** As the clock speed of a CPU increases, so does its power consumption. This is because the transistors have to switch more often, and this requires more energy.
* **The heat generated by CPUs.** When a CPU consumes more power, it also generates more heat. This heat can cause the CPU to throttle its clock speed, or even shut down, to prevent damage.

As a result of these factors, CPU manufacturers have shifted their focus to other ways to improve CPU performance, such as increasing the number of cores and using new technologies like multithreading.

Here are some of the ways that CPU manufacturers are increasing performance without increasing clock speed:

* **Adding more cores.** A CPU with multiple cores can run multiple tasks at the same time, which can significantly improve performance.
* **Using multithreading.** Multithreading allows a single core to run multiple tasks by dividing each task into smaller parts that can be executed independently.
* **Using new technologies.** New technologies such as 3D stacking and chiplets are allowing CPU manufacturers to pack more transistors into a smaller space, which can improve performance without increasing clock speed.

So, while CPU clock speed may not be increasing anymore, there are still many ways to improve CPU performance.

*Assembly Theory and Constructor Theory*

**Assembly Theory** (Lee Cronin and Sara Walker)

https://www.quantamagazine.org/a-new-theory-for-the-assembly-of-life-in-the-universe-20230504/

**Constructor Theory** (Chiara Marletto and David Deutsch)

https://www.quantamagazine.org/with-constructor-theory-chiara-marletto-invokes-the-impossible-20210429/


https://www.quantamagazine.org/physicists-trace-the-rise-in-entropy-to-quantum-information-20220526/

*Computability and Consciousness*

Consciousness is compression: [Will GPT-5 achieve consciousness? | Joscha Bach and Lex Fridman](https://www.youtube.com/watch?v=YDkvE9cW8rw&t=139s)

https://en.m.wikipedia.org/wiki/Hard_problem_of_consciousness

https://en.m.wikipedia.org/wiki/Philosophical_zombie

Aaaronson: [Why I Am Not An Integrated Information Theorist (or, The Unconscious Expander)](https://scottaaronson.blog/?p=1799)

[Integrated information theory](https://en.m.wikipedia.org/wiki/Integrated_information_theory)
* Tegmark has also tried to address the problem of the computational complexity behind the calculations. According to Max Tegmark “the integration measure proposed by IIT is computationally infeasible to evaluate for large systems, growing super-exponentially with the system’s information content.”
* As a result, Φ can only be approximated in general. However, different ways of approximating Φ provide radically different results.
* “Which physical states are associated with consciousness, and which are not?” This question is what Scott Aaronson (2014) has dubbed the term “Pretty Hard Problem” [Source](https://forum.effectivealtruism.org/posts/Qiiiv9uJWLDptH2w6/the-pretty-hard-problem-of-consciousness)
* thought experiments such as Mary the super-scientist, color spectrum inversion, or p-zombies, which are meant to draw our attention to the alleged gap between physical explanations and consciousness.

https://www.quantamagazine.org/neuroscience-readies-for-a-showdown-over-consciousness-ideas-20190306/

*Computer Performance*

The [Frontier supercomputer](https://en.m.wikipedia.org/wiki/Frontier_(supercomputer)) is capable of making 1,102,000 [TFLOPs](https://en.m.wikipedia.org/wiki/FLOPS) (1.1 quintillion calculations per second)

https://en.m.wikipedia.org/wiki/Exascale_computing

https://en.m.wikipedia.org/wiki/Floating-point_arithmetic

https://en.m.wikipedia.org/wiki/Instructions_per_second

https://en.m.wikipedia.org/wiki/Double-precision_floating-point_format

https://de.m.wikipedia.org/wiki/Gleitkommazahl#

https://en.wikibooks.org/wiki/A-level_Computing/AQA/Paper_2/Fundamentals_of_data_representation/Floating_point_numbers

*Matrix Multiplication on Classical Computers*

Video 1: [How do GPUs speed up Neural Network training?](https://www.youtube.com/watch?v=EKD1kEMNeeU&list=WL&index=27)

Video 2: [EE5332 L11.3 - Matrix Multiplication on NVidia GPUs](https://www.youtube.com/watch?v=fpwq5zDBO2o&list=WL&index=28&t=362s)

![geometry](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_1633.png)

![geometry](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_1634.png)

![geometry](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_1635.png)

![geometry](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_1636.png)

![geometry](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_1637.png)

![geometry](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_1638.png)

*(RSA) Encryption*

Video [Breaking RSA](https://youtu.be/-ShwJqAalOk)

Public key cryptography
* https is developed on top of this
* RSA: revest shavor edelmann, based on prime numbers, older more established algoithm
* Elliptic curve cryptography
* Modular arithmetic
* Feistel Cypher (block cypher): https://de.m.wikipedia.org/wiki/Feistelchiffre
* Diffie Hellmann key exchange
* It's an exponential problem on classical computers that quantum computers can solve in polynomial time

Extended Euclidean algorithm

$\begin{aligned} & (e, n)(d) \\ & n=p \cdot q \\ & \Phi(n)=(p-1) \cdot(q-1) \\ & e \cdot d \equiv 1 \bmod (\Phi(n))\end{aligned}$

Fermat‘s factorization algorithm: if p and q are close to each other, this gives you a solution extremely quickly

$\begin{aligned} & N=a^2-b^2=(a+b) \cdot(a-b) \\ & b^2=a^2-N \\ & a=[\sqrt{N}]\end{aligned}$

[Prime Numbers and RS Encryption Algorithm](https://youtu.be/JD72Ry60eP4)


![geometry](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_1630.png)


[Entropy coding](https://en.m.wikipedia.org/wiki/Entropy_coding), [Arithmetic coding](https://en.m.wikipedia.org/wiki/Arithmetic_coding) and [Lossless compression](https://en.m.wikipedia.org/wiki/Lossless_compression)


**Pre-electronic computing machines (why are we working with binary?)**
- Greece: [Antikythera](https://de.m.wikipedia.org/wiki/Mechanismus_von_Antikythera) to predict solar esclipses
- Charles Babbage and [Difference engine and Analytical Engine](https://de.m.wikipedia.org/wiki/Analytical_Engine)
- Fur Dezimalsystem: kann man ein Mechanisches Zahnrad erstellen, zehn Zähne schleifen and jeweils equal distances apart um genau zu rechnen?
- But Charles had complains from their gear cutters. He had to pay a lot, because manufacturing wasn’t advance at that time yet (no large-scale automation in manufacturing).
- Later: electro-mechanical era of relays
- Then: early electronic era (1930/40s): the first electronic logic elements. you talked about [thermionic valves (Röhrencomputer)](https://de.m.wikipedia.org/wiki/Röhrencomputer)
    -  Röhrenrechner stellen den Übergang zwischen den [Analogrechnern](https://de.m.wikipedia.org/wiki/Analogrechner) (=Berechnungen mit Hilfe von kontinuierlichen mechanischen oder elektrischen Vorgängen) zu den auf [Halbleitertechnik](https://de.m.wikipedia.org/wiki/Halbleitertechnik) basierenden [Minirechnern](https://de.m.wikipedia.org/wiki/Minirechner) dar.
- How they works: vacuum tubes as a logic element on and off. Cathode with boils off electrons and anode with a big voltage on it, sucking off the electrons up, and in the middle there is a grid. By putting a bias voltage on the grid. If you put a heavy negative voltage on the grid, it repelled the electrons back to the cathode plate and didn’t let them through. So you switched it off. But you needed heaters to boil the electrons off the cathode. Heaters, power consumption was huge.
- One reason to use binary was that it’s perfect for the logic. They are simpler to build!
- But: For decimal you need fewer digits (binary is a lot longer)
- How many more circuitry and components do you need if you go for a binary computing than if you go for a decimal one (how many more digits we need?): log_2 10 = 3.322 (= you need 3.3 times as much binary circuitry if you did binary)
    - e.g. how many bits do you need to represent 99 in binary?
    - 8 bits = 256, 7 = 128. So 99 requires 7 bits
    - Taking 99 in decimal are 2 bits, using the formula above: 2 x 3.322 = 6.644 ~ 7 bits
- Alternaive approach in 1930s: bi-quinary: base 5 ([“Colossus”](https://de.m.wikipedia.org/wiki/Colossus))
- [Why Use Binary? - Computerphile](https://www.youtube.com/watch?v=thrx3SBEpL8) by [Tommy Flowers](https://de.m.wikipedia.org/wiki/Tommy_Flowers)


**Sind 8 Bit 1 Byte?**
* Bit = Datentransferrate (Geschwindigkeit).
* Byte = Speichergröße (Kapazität)
* 1 Byte in Bits umgerechnet sind 8 Bits.
* 1 Byte kann somit 2^8 (256 Bits) verschiedene Zustände darstellen. Somit ist 1 Byte meist die kleinste, adressierbare Speichereinheit, um Zeichen wie einen Buchstaben abzubilden.

**Ackermann Funktion und Sudanfunktion - Rekursive und berechenbare Funktionen**

* 1926 vermutete David Hilbert, dass jede [berechenbare](https://de.m.wikipedia.org/wiki/Berechenbarkeit) Funktion [primitiv-rekursiv](https://de.m.wikipedia.org/wiki/Primitiv-rekursive_Funktion) sei, siehe auch [Berechenbarkeitstheorie](https://de.m.wikipedia.org/wiki/Berechenbarkeitstheorie).

  * Vereinfacht bedeutet dies, dass sich jede durch einen Computer berechenbare Funktion **aus einigen wenigen, sehr einfachen Regeln zusammensetzen lässt und dass sich die Dauer der Berechnung im Voraus abschätzen lässt**. Dies trifft auf nahezu alle in der Praxis vorkommenden Funktionen zu.
  
  * Dies wurde durch Wilhelm Ackermann und Gabriel Sudan – beides seine Schüler – mittels unterschiedlichen Funktionen, die zeitnah (Sudan 1927 und Ackermann 1928) publiziert wurden, widerlegt. Die Sudanfunktion und die Ackermannfunktion waren so die ersten veröffentlichten, nicht primitiv rekursiven Funktionen.

* Die [Ackermannfunktion](https://de.m.wikipedia.org/wiki/Ackermannfunktion) ist eine 1926 von Wilhelm Ackermann gefundene, extrem schnell wachsende mathematische Funktion, mit deren Hilfe in der theoretischen Informatik Grenzen von Computer- und Berechnungsmodellen aufgezeigt werden können. Heute gibt es eine ganze Reihe von Funktionen, die als Ackermannfunktion bezeichnet werden. Diese weisen alle ein ähnliches Bildungsgesetz wie die ursprüngliche Ackermannfunktion auf und haben auch ein ähnliches Wachstumsverhalten.

* Die [Sudanfunktion](https://de.wikipedia.org/wiki/Sudanfunktion) ist eine rekursive berechenbare Funktion, die total μ-rekursiv, jedoch nicht primitiv rekursiv ist, was sie mit der bekannteren Ackermannfunktion gemeinsam hat.





#### <font color="blue">**Quantum Mechanics**

###### *(2018) Supervised learning with quantum enhanced feature spaces (**Data Encoding / Embedding**)*

> **The easiest way for a parameter to enter a circuit is through a rotation of a single qubit, in proportion to the value of a single datapoint, so a single scalar value:**

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_190.png)

> **You can also use a sequence of rotations to embedd data (reuploding).** And maybe there is free parameters in between as well. Can make a more complex function available than if you upload only once in a single rotation.

> **Learnable embeddings**: The other idea is to actually have a trainable embedding layer. Not to worry about training the unitary of the circuit, but worry about training the embedding and then use standard quantum information metrics to classify the data.

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_191.png)

**Basis (State) Encoding**

* We gave data points x12 and x2

* We first need to represent data in binary form, like 00 and 10

* Then we encode it into a QC in a way, such that we have basis states that represents them like |00> and |10>

* with all other basis states having probability zero - represented in the amplitude vector

* So we encode data in quantum state that is aligned with basis states

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_856.png)

**Amplitude Encoding**

* we want to encode our classical information into an amplitude vector

* you have a classical data vector with 4 entries (features) x1

* now construct a circuit, so that we have an amplitude vector that corresponds to the values in the classical data vector:

	* we have 2 qubits initialized in the ground state

	* then we apply some operations U (x1) on these qubits

	* and then we get a quantum state that corresponds to an amplitude vector that exactly represents our classical data points

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_857.png)

**Angle Encoding**

* we have 2 dimension data that we can write in a two dimensional vector

* then I take the number of qubits equal to the number of features (rows / entries in a classical vector)

* then I apply rotations to each of these qubits that are equal to the value of the features

	* for example I rotate the first qubit about some axis Z. The rotation value / rotation angle is equal to the first classical feature value

	* the I take my second qubit and rotate it, for example again by the Z axis, and the angle of the rotation is equal to the second feature value of my data point

* for higher dimensional data, for example a third dimension classical vector, then I simply add more qubits to my system to encode this information

* For example:  [z feature map (Qiskit)](https://qiskit.org/documentation/stubs/qiskit.circuit.library.ZFeatureMap.html#qiskit.circuit.library.ZFeatureMap):
	* apply Hadamard operator first to each of the qubits, and then encode data values in rotations
	* and then repeat this as many times as you want (stacking operations sequentially like in the image) to encode data multpiple times in a row

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_858.png)

**Higher Order Encoding**

* there is no theoretical reason why doing this or if it's better or not

* the idea comes from the paper [Supervised learning with quantum enhanced feature spaces](https://arxiv.org/abs/1804.11326)

* Basic idea: let's do an encoding that is hard to reproduce classically and simulate, and then maybe we get some quantum advantage in doing this

* we have some two dimensional data with a vector with 2 entries

	* choose number of qubits = number of feature values

	* then apply an hadarmard on each qubit

	* and then do rotations about some axis Z, and the first angle is the first feature value, and the same with the second qubit

	* and then we apply some entanglement gates between these qubits

	* then we do another rotation (for example again Z axis), but this rotation angle depends on some function of the product of the feature values R(x^1 * x^2)

	* this is where the name comes from: we encode in a higher order product space

	* and this whole block of this encoding can be repeated, which is called the depth of the feature maps

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_860.png)

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_861.png)

**Other Encodings**

* Hamiltonian evolution ansatz encoding

* Displacement Encoding

* IQP Encoding (Instantaneous quantum polynomial)

* Squeezing Encoding

* QAOA Encoding

###### ***Mathematical Formulation & Postulates of Quantum Mechanics***

https://phys.org/news/2023-01-scientists-quantum-harmonic-oscillator-room.html

> Video: [Map of Quantum Physics](https://youtu.be/gAFAj3pzvAA)

"There are (at least) two possible ways to formulate precisely (i.e. mathematically) elementary
QM. The eldest one, historically speaking, is due to von Neumann in essence, and is formulated using the language of Hilbert spaces and the spectral theory of unbounded operators. A more recent and mature formulation was developed by several authors in the attempt to solve quantum field theory problems in mathematical physics. It relies on the theory of abstract algebras (*-algebras and C* -algebras) that are built mimicking the operator algebras defined and studied, again, by von Neumann (nowadays known as W* -algebras or von Neumann algebras), but freed from the Hilbert-space structure. The core result is the celebrated GNS theorem (after Gelfand, Najmark and Segal), that we will prove in Chap. 14. The newer formulation can be considered an extension of the former one, in a very precise sense that we shall not go into here, also by virtue of the novel physical context it introduces and by the possibility of treating physical systems with infinitely many degrees of freedom, i.e. quantum fields. In particular, this second formulation makes precise sense of the demand for locality and covariance of relativistic quantum field theories, and allows to extend quantum field theories to a curved spacetime."
-Valter Moretti

https://phys.org/news/2022-11-common-misconceptions-quantum-physics.html

https://www.derstandard.de/story/2000140294674/wie-die-quantenphysik-mit-unserer-vorstellung-von-realitaet-aufraeumt

https://physicsworld.com/a/how-the-stern-gerlach-experiment-made-physicists-believe-in-quantum-mechanics/

[Korrespondenzprinzip](https://de.m.wikipedia.org/wiki/Korrespondenzprinzip): Klassische Größen werden durch Operatoren ersetzt. Die quantenmechanische Aufenthaltswahrscheinlichkeitsdichte eines Teilchens ist proportional zum Quadrat der Wellenfunktion der Materiewelle an jener Stelle. Für große Quantenzahlen geht die quantenmechanische Wahrscheinlichkeitsdichte asymptotisch in die klassische über.

![ggg](https://upload.wikimedia.org/wikipedia/commons/thumb/0/05/Korrespondenzprinzip.svg/640px-Korrespondenzprinzip.svg.png)


*Postulate der Quantenmechanik (Kopenhagener Interpretation)*

1. **Zustand**: Der Zustand eines physikalischen Systems zu einem Zeitpunkt $t_{0}$ wird durch die Angabe eines zum Zustandsraum $\mathcal{H}$ gehörenden komplexen Zustandsvektors $\left|\psi\left(t_{0}\right)\right\rangle$ definiert. Vektoren, die sich nur um einen von 0 verschiedenen Faktor $c \in \mathbb{C}$ unterscheiden, beschreiben denselben Zustand. Der Zustandsraum des Systems ist ein Hilbertraum.

2. **Observable**: Jede Größe $A$, die physikalisch , gemessen" werden kann, ist durch einen im Zustandsraum wirkenden hermiteschen Operator $\hat{A}$ beschrieben. Dieser Operator wird als Observable bezeichnet und hat ein reelles Spektrum mit einer vollständigen sogenannten Spektralschar, bestehend aus einem , diskreten" Anteil mit Eigenvektoren und Eigenwerten (Punktspektrum) und aus einem Kontinuum.

3. **Messresultat**: Resultat der Messung einer physikalischen Größe $A$ kann nur einer der Eigenwerte der entsprechenden Observablen $\hat{A}$ sein oder bei kontinuierlichem Spektrum des Operators eine messbare Menge aus dem Kontinuum.

4. **Messwahrscheinlichkeit im Fall eines diskreten nichtentarteten Spektrums**: Wenn die physikalische Größe $A$ an einem System im Zustand $|\psi\rangle$ gemessen wird, ist die Wahrscheinlichkeit $P\left(a_{n}\right)$, den nichtentarteten Eigenwert $a_{n}$ der entsprechenden Observable $\hat{A}$ zu erhalten (mit dem zugehörigen Eigenvektor $\left|u_{n}\right\rangle$ ) $P\left(a_{n}\right)=\left|\left\langle u_{n} \mid \psi\right\rangle\right|^{2}$. Dabei seien $\psi$ und $u_{n}$ normiert.

5. **Die Zeitentwicklung des Zustandsvektors** $|\psi(t)\rangle$ ist gegeben durch die folgende Schrödingergleichung, wobei $\hat{H}(t)$ die der totalen Energie des Systems zugeordnete Observable ist:

>$\mathrm{i} \hbar \frac{\partial}{\partial t}|\psi(t)\rangle=\hat{H}(t)|\psi(t)\rangle$

http://vergil.chemistry.gatech.edu/notes/quantrev/node20.html

* [Mathematical_formulation_of_quantum_mechanics](https://en.m.wikipedia.org/wiki/Mathematical_formulation_of_quantum_mechanics)
* [C*-algebra](https://en.m.wikipedia.org/wiki/C*-algebra)
* [Quantum_geometry](https://en.m.wikipedia.org/wiki/Quantum_geometry)
* [Noncommutative_geometry](https://en.m.wikipedia.org/wiki/Noncommutative_geometry)
* [Geometric_quantization](https://en.m.wikipedia.org/wiki/Geometric_quantization)

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_209.png)


**Video: [Crash course in density matrices](https://www.youtube.com/watch?v=1tserF6VGqI)**

**Single spin one half particle, focus on spin degrees of freedom & Pauli-matrices**

* when the spin degrees of freedom interact with an electromagnetic field, the Pauli matrices come into play:

> $\sigma^{Z}=\left(\begin{array}{cc}1 & 0 \\ 0 & -1\end{array}\right) \quad \sigma^{X}=\left(\begin{array}{ll}0 & 1 \\ 1 & 0\end{array}\right) \quad \sigma^{Y}=\left(\begin{array}{cc}0 & -i \\ i & 0\end{array}\right)$

* we have chosen a basis in such a way that the Pauli Z matrix is diagonal. Here are its basis vectors, the spin up in the z direction and the spin down direction, written as column vectors:

> $|\uparrow\rangle=\left(\begin{array}{l}1 \\ 0\end{array}\right) \quad |\downarrow\rangle=\left(\begin{array}{l}0 \\ 1\end{array}\right)$

* we can re-express the basis vectors for the Pauli X matrix in either direction in terms of these vectors, but in the positive direction we can write it in the following way:

> $|\rightarrow\rangle=\frac{1}{\sqrt{2}}(|\uparrow\rangle+|\downarrow\rangle)$

**States we are used to use in quantum mechanics are called 'pure states'.**

* Any state can be written as the linear combination of the up and down vectors in the z-direction

> $|\psi\rangle=a|\uparrow\rangle+b|\downarrow\rangle$

* a and b are complex numbers whose modulus squared sum to 1

> $\langle\psi \mid \psi\rangle=|a|^{2}+|b|^{2}$

* a and b can be referred to as the probability amplitudes, where their modulus squared are the probabilities that the system is in a particular configuration.

* If we perform an ensemble of measurements on the system, we will find that the mean or expected value, for example of the Pauli-Z matrix, will be given by:

> $\left\langle\sigma^{Z}\right\rangle=\left\langle\psi\left|\sigma^{Z}\right| \psi\right\rangle$

**Dynamics: if we want to study how a quantum system changes in time, we usually refer to the Schroedinger equation.**

> $i \frac{d}{d t}|\psi(t)\rangle=\hat{H}|\psi\rangle$

* the operator $\hat{H}$ is called the Hamiltonian and it tells us about the total energy in a system, and how things in a system interact with each other.

* the easiest way to solve this problem is to first solve the Eigenvalue problem, which involves finding the Eigenvectors and the Eigenvalues of the hamiltonian. We are able the different values of the Eigenvalues and Eigenvectors with the label k:

> $\hat{H}\left|E_{k}\right\rangle=E_{k}\left|E_{k}\right\rangle$

* This allows us to know that the Eigenvectors, when plugged into the Schroedinger equation ...

> $i \frac{d}{d t}\left|E_{k}\right\rangle=\hat{H}\left|E_{k}\right\rangle=E_{k}\left|E_{k}\right\rangle$

* ... just pick up a phase in time depending on the energy they correspond to:

> $\left|E_{k}(t)\right\rangle=e^{-i E_{k} t}\left|E_{k}\right\rangle$

**Now to sum more general situation with some generic state $\psi$**

* we can decompose $\psi$ in terms of energy Eigenbasis:

> $|\psi\rangle=\sum_{m} c_{m}\left|E_{m}\right\rangle$

* where $c_{m}$ is given by the inner product of the energy Eigenvector labelled by m and $\psi$ itself:

> $c_{m}=\left\langle E_{m} \mid \psi\right\rangle$

* We can then time-evolve the state by simply time-evolving the energy Eigen-Kets:

> $|\psi(t)\rangle=\sum_{m} c_{m} e^{-i E_{m} t}\left|E_{m}\right\rangle$

* Tracking the expectation value of an observable is quite easy: Simply applying the state vector in time to both sides of the matrix gives the following equation:

> $\langle A(t)\rangle=\sum_{m, n} \bar{c}_{n} c_{m} A_{n, m} e^{i\left(E_{n}-E_{m}\right) t}$

* Where we have labelled the matrix entries of a by the energy Eigenbasis in the following way:

> $A_{n, m}=\left\langle E_{n}|A| E_{m}\right\rangle$

**Propeties of the trace of a matrix**

* We have a basis of states labeled by j $|j\rangle, j=1,2$.. N that form a complete orthonormal basis. Then I write the trace of a matrix in the following way:

> $\operatorname{Tr}(A)=\sum_{j}\langle j|A| j\rangle$

* the trace is a linear mapping which tells us that we can separate the trace of the sum of two matrices apart like this, and we can also pull scalar multiples outside of the trace:

> $\operatorname{Tr}(A+B)=\operatorname{Tr} A+\operatorname{Tr} B$ with $\operatorname{Tr}(c A)=c \operatorname{Tr} A$

* When we take the trace of two matrices multiplied by each other, the trace is invariant under swapping the two matrices inside of the trace.

> $\operatorname{Tr}(A B)=\operatorname{Tr}(B A)$

**Density Matrices**

* Going from a pure state represented by a Ket we can introduce the density matrix which is a completely equivalent way to represent the state of a quantum system

* the density matrix in this context is just the outer product of the state with itself:

> $\rho=|\psi\rangle\langle\psi|$

* the expectation value is rewritten in terms of a trace, but it is mathematically equivalent to the earlier expression:

> $\left\langle\sigma^{Z}\right\rangle=\operatorname{Tr}\left(\rho \sigma^{2}\right)=\left\langle\psi\left|\sigma^{Z}\right| \psi\right\rangle$

* the density matrix has two fundamental properties: it's trace is 1 in the context of a pure state:

> $\operatorname{Tr} \rho=1$

> $\operatorname{Tr}|\psi\rangle\left\langle\left.\psi|=|\langle\psi \mid \psi\rangle\right|^{2}=1\right.$

* and it's a positive operator:

> $\rho \geq 0$

* If I take any state vector our state space and perform the following operation, then the result is always greater than or equal to zero:

> $\langle\phi|\rho| \phi\rangle=|\langle\phi \mid \psi\rangle|^{2} \geq 0$

**Dynamics: von Neumann equation of time evolution**

* show from the Schroedinger equation we can derive the von Neumann equation of time evolution:

> $i \frac{\partial \rho}{\partial t}=[\hat{H}, \rho]$

* It features the commutation relationship between the Hamiltonian and the density matrix itself

* then the expectation value for an observable in time can be rewritten in the following way:

> $\langle A(t)\rangle=\operatorname{Tr}(\rho(t) A)$

* where the density matrix $\rho$ evolves with the Hamiltonian being applied to it in time:

> $\rho(t)=e^{-i \hat{H} t} \rho e^{i \hat{H} t}$

*  similarly re-expressed in the basis of the energy Eigenvalues

> $\rho(t)=\sum_{m, n} \rho_{m, n} e^{-i\left(E_{m}-E_{n}\right) t}\left|E_{m}\right\rangle\left\langle E_{n}\right|$

* where $\rho_{m,n}$ can be written in the following way, where we have re-expressed its entries in terms of the energy Eigenbasis:

> $\rho_{m, n}=\left\langle E_{m}|\rho| E_{n}\right\rangle$


**Mixed States**

* what if we are unsure of what pure state our system is in? (We are somehow ignorant / unwissend to what pure state we are in)

* then we can describe a statistical ensemble of pure states which we call a mixed states

> $\rho=\sum_{j} p_{j}\left|\psi_{j}\right\rangle\left\langle\psi_{j}\right|$

* the ensemble is written as a sum of pure states with probabilities $p_j$ and the probabilities are greater than zero and sum up to one:

> $p_{j} \geq 0 \quad \sum_{j} p_{j}=1$

* Mixed states are also trace 1 and are positive operators:

> $\operatorname{Tr} \rho=1, \quad \rho \geq 0$

* Dynamics and the expectation values work in an identical way

**Non-uniqueness of mixed states decomposition**

* Interestingly it's not completely correct to interpret $p_j$ as the probability of being in a particular state labeled by j.

* To see this consider the following example let $\rho$ be a mixed state written in the following way:

> $\rho=\frac{4}{5}|\downarrow\rangle\left\langle\downarrow\left|+\frac{1}{5}\right| \uparrow\right\rangle\langle\uparrow|$

* we say the system is in state down with probability 4/5 and in state up with 1/5.

* What if I prepared two other states with the following new state vectors a and b with the following probability amplitudes of being in down or up state:

> $|a\rangle=\sqrt{\frac{4}{5}}|\downarrow\rangle+\frac{1}{\sqrt{5}}|\uparrow\rangle$

> $|b\rangle=\sqrt{\frac{4}{5}}|\downarrow\rangle-\frac{1}{\sqrt{5}}|\uparrow\rangle$

* Then if we prepare these states with probability one-half:

> $\rho=\frac{1}{2}|a\rangle\left\langle a\left|+\frac{1}{2}\right| b\right\rangle\langle b|$

* We see that we get the same density matrix as before working this out expanding the definitions of a and b allows us to arrive at our original density matrix:

> $\rho=\frac{4}{5}|\downarrow\rangle\left\langle\downarrow\left|+\frac{1}{5}\right| \uparrow\right\rangle\langle\uparrow|$

* Therefore two different ensembles of pure states can give rise to the same mixed state and we must be careful when we interpret $p_j$ as strictly probabilities of a particular system being strictly in its associated pure state in the sum.

* Finally, if you get a matrix how could you tell if it's pure or mixed? Mathematically quite simple way to check: square the matrix and take its trace. If the trace is still 1, we say that the density matrix is pure if it is less than one then we say it's mixed. This is actually independent of the time evolution.

> $\operatorname{Tr}\left(\rho^{2}\right) \leq 1$

###### *Ehrenfest and the Liouville correspondence*

The Ehrenfest theorem and the Liouville's theorem provide important connections between the microscopic (quantum) and the macroscopic (classical) descriptions of physical systems.

**Ehrenfest Theorem**: In quantum mechanics, the Ehrenfest theorem provides a link between the quantum mechanical description of the motion of particles and the classical motion as described by Newton's laws. Named after Paul Ehrenfest, who proved it in 1927, it demonstrates how the expectation values (average values) of quantum observables (like position and momentum) evolve with time. Under some circumstances, when quantum effects are small, these expectation values follow trajectories similar to classical mechanics.

Mathematically, the Ehrenfest theorem states that the time derivative of the expectation value of an operator equals the expectation value of the time derivative of the operator. For position (x) and momentum (p), it can be written as:

d〈x〉/dt = 〈p〉/m  (Ehrenfest's first theorem, analogous to velocity = momentum/mass)
d〈p〉/dt = -〈dV/dx〉  (Ehrenfest's second theorem, analogous to Newton's second law)

where V is the potential energy and m is the mass of the particle.

**Liouville's Theorem**: In classical statistical mechanics, Liouville's theorem describes the time evolution of phase space distribution function (a function that gives the probability of the system to be in a certain state with given position and momentum) in a Hamiltonian system (a system governed by Hamilton's equations). The theorem states that the distribution function is constant along the trajectories of the system. That is, the total phase space volume is conserved.

This provides a bridge between microscopic (individual particle trajectories) and macroscopic (bulk or average behavior) properties of the system. The Liouville's theorem is a cornerstone in the development of statistical mechanics and the concept of entropy.

Both of these theorems provide important links between classical and quantum descriptions of physics. They show that in certain limits, classical and quantum mechanical descriptions become similar, providing a physical intuition for the correspondence principle, which states that quantum mechanics must reproduce classical physics in the limit of large quantum numbers.

###### ***Ket $|\psi\rangle$ - State Space $V$(Vector)*** *- also: Wave Function (Pure State, Postulate I)*

> <font color="blue">**<u>Postulate I of Quantum Mechanics</u>: The state of a physical system is characterized by a state vector that belongs to a complex vector space $\mathcal{V}$, called the state space of the system**

* State Space Formalism: unification of Schrodinger wave mechanics and matrix mechanics (Heisenberg, Born, Jordan), both are equivalent.

* **Matrix mechanics ('matrix formulation'): Most useful when we deal with finite, discrete bases (like spin). Then it reduces to the rules of simple matrix multiplication**. Kronecker delta

* **Schrodinger wave mechanics: for continuous basis (like position)** Dirac delta function

* State Space: the vector space in which quantum systems live

  * Euclidean space: classical physics, 3D, real, inner product (Hilbert Space)

  * State space: quantum physics, infinite dimensions, complex numbers, inner product (Hilbert Space)

Video: [Dirac notation: state space and dual space](https://www.youtube.com/watch?v=hJoWM9jf0gU)

<font color="blue">*Ket Algebra - Quantum State Vector*

**A Ket $|\psi\rangle$ $\doteq$ $\left[\begin{array}{l}a_{0} \\ a_{1}\end{array}\right]$, also called 'quantum state', <u>represents</u> the wave function of a quantum system ("Psi")**, consisting of **probability amplitudes**. This Quantum state is a **stochastic vector**, written as a column vector.

> **A Ket is technically a pure quantum state**.

* Wave function notation to describe superposition of Pure States.

So, you have an (orthonormal) basis with 3 vectors and coefficients, to describe a vector in space (in Hilbert space):

> $\vec{v}=$<font color='blue'>$2$</font>$\left(\begin{array}{l}1 \\ 0 \\ 0\end{array}\right)$<font color='blue'>$+3$</font>$\left(\begin{array}{l}0 \\ 1 \\ 0\end{array}\right)$<font color='blue'>$+0$</font>$\left(\begin{array}{l}0 \\ 0 \\ 1\end{array}\right)$

The basis vectors will all entries in 0 and only one with 1 correspond to possibe measurement outcomes (spin up or down).

The coefficients can be collected in one vector and are complex numbers:

> $\vec{v}=$<font color='blue'>$\left(\begin{array}{l}2 \\ 3 \\ 0\end{array}\right)$</font>

An arbitrary state for a qubit can be written as a linear combination of the Pauli matrices, which provide a basis for $2 \times 2$ self-adjoint matrices:

> $
\rho=\frac{1}{2}\left(I+r_{x} \sigma_{x}+r_{y} \sigma_{y}+r_{z} \sigma_{z}\right)
$

* where the real numbers $\left(r_{x}, r_{y}, r_{z}\right)$ are the coordinates of a point within the unit ball and

> $
\sigma_{x}=\left(\begin{array}{ll}
0 & 1 \\
1 & 0
\end{array}\right), \quad \sigma_{y}=\left(\begin{array}{cc}
0 & -i \\
i & 0
\end{array}\right), \quad \sigma_{z}=\left(\begin{array}{cc}
1 & 0 \\
0 & -1
\end{array}\right)
$


Siehe auch: [Projective Hilbert Space](https://en.m.wikipedia.org/wiki/Projective_Hilbert_space) with rays or projective rays

<font color="blue">*Multiply Ket with a Scalar: Vector-Scalar-Multiplication*

Vector-Scalar:

$\left[\begin{array}{l}x_{0} \\ x_{1}\end{array}\right] \otimes\left[y_{0}\right]=\left[\begin{array}{l}x_{0}\left[y_{0}\right] \\ x_{1}\left[y_{0}\right]\end{array}\right]=\left[\begin{array}{l}x_{0} y_{0} \\ x_{1} y_{0}\end{array}\right]$


<font color="blue">*Multiply Ket with another Ket: Ket - Tensor Product (Vector-Vector-Multiplication, Kronecker Product)*

> $\mathbf{uv}$ = $\left[\begin{array}{c}u_{1} \\ u_{2}\end{array}\right]$ $\otimes$ $\left[\begin{array}{c}v_{1} \\ v_{2} \end{array}\right]$ = $\left[\begin{array}{l}u_{1}\left[\begin{array}{l}v_{1} \\ v_{2}\end{array}\right] \\ u_{2}\left[\begin{array}{l}v_{1} \\ v_{2}\end{array}\right]\end{array}\right]$=  $\left[\begin{array}{c}u_{1} v_{1} \\ u_{1} v_{2}\\ u_{2} v_{1} \\ u_{2} v_{2}\end{array}\right]$

*Zur Kombination von Quantum States:*

> $\left[\begin{array}{l}1 \\ 0\end{array}\right]=|0\rangle, \quad\left[\begin{array}{l}0 \\ 1\end{array}\right]=|1\rangle$.

We choose two qubits in state $|0\rangle$:

> $|0\rangle \otimes|0\rangle = \left[\begin{array}{l}1 \\ 0\end{array}\right] \otimes\left[\begin{array}{l}1 \\ 0\end{array}\right]=$</font> $\left[\begin{array}{l}1\left[\begin{array}{l}1 \\ 0\end{array}\right] \\ 0\left[\begin{array}{l}1 \\ 0\end{array}\right]\end{array}\right]=$ $\left [\begin{array}{l}11 \\ 10 \\ 01 \\ 00\end{array}\right]$ = <font color="gray">$\left [\begin{array}{l}3 \\ 2 \\ 1 \\ 0\end{array}\right]$</font> = <font color="blue">$\left [\begin{array}{l}1 \\ 0 \\ 0 \\ 0\end{array}\right]$

Quits in two different states:

> $|0\rangle \otimes|1\rangle = \left[\begin{array}{l}1 \\ 0\end{array}\right] \otimes\left[\begin{array}{l}0 \\ 1\end{array}\right]=\left[\begin{array}{l}1\left[\begin{array}{l}0 \\ 1\end{array}\right] \\ 0\left[\begin{array}{l}0 \\ 1\end{array}\right]\end{array}\right]=\left[\begin{array}{l}0 \\ 1 \\ 0 \\ 0\end{array}\right]$

**How do we represent Ket's in a particular basis?**

> <font color="blue">$|\psi\rangle=\sum_{i} c_{i}\left|u_{i}\right\rangle \quad$ where: $c_{i}=\left\langle u_{i} \mid \psi\right\rangle$

Video: [Representations in quantum mechanics](https://www.youtube.com/watch?v=rp2k2oR5ZQ8)


> $\left\{c_{i}\right\}$ are the representation of $|\psi\rangle$ in the $\left\{\left|u_{i}\right\rangle\right\}$ basis

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_210.png)

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_211.png)

* a und b coefficients in euclidean space sind wie $u_i$ coefficients in quantum state space. (difference is bra-ket, wo bra das conjugate complex ist als dot product mit ket: es ist im erstem argument antilinear!).

* expansion coefficients c are given by projection of the Ket onto the basis states u:

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_212.png)

* Here we can take out $|\Psi\rangle$ because it doesnt explicitely depend on $_i$:

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_213.png)

**Additional Ket-Algebra**

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_202.png)

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_203.png)

For euclidean space: the scalar product is linear both in first and second argument:

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_204.png)

For state space: the scalar product is only linear in the second argument (and antilinear in the first argument):

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_205.png)

###### ***Bra $\langle \psi |$ - Dual Space $V^{*}$ (Covector)***

**How do we represent Bra's in a particular basis?**

* Bra $\langle \Psi|$ is an element of the dual vector space $V^*$

* Bra Psi times the identity operator ($\langle \Psi| * \mathbb{I}$), and then write the identity operator out (its resolution in the u basis)

* psi doesnt explicitylt depend on i, so we can write it into the summation, which brings us to this expression

> $\sum_{i}\left\langle\psi \mid u_{i}\right\rangle\left\langle u_{i}\right|$

* This is an expression for the bra psi in terms of the basis bra's u, and then the expansion coefficients are the brackets between psi and u $\left\langle\psi \mid u_{i}\right\rangle$


If we look at these expansion coefficients $\left\langle\psi \mid u_{i}\right\rangle$, we can use the conjugation property of the scalar product to rewrite them like this:

> $\left\langle\psi \mid u_{i}\right\rangle$ = $\left\langle u_{i} \mid \psi \right\rangle^*$ = $c_i^*$

**The expansion coefficients are the complex conjugates of the expansion coefficients of the Ket**

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_214.png)

###### *Changing Basis*

**Changing Basis**

* <font color="blue">**For example: A particles position can be expressed as the superposition of momentum states**</font>

* Goal: choose a 'good' basis that makes the maths as simple as possible

Video: [Changing basis in quantum mechanics](https://www.youtube.com/watch?v=CDmXvPDMIFs)

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_252.png)

If we go from one representation to another: we need to calculate the overlaps between the corresponding basis states (with overlap matrix):

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_253.png)

How do we get back from the new to the old basis?

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_254.png)

How we do transform the representation of operators between basis? (resolve identities in the u basis)

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_255.png)

Summary:

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_256.png)

###### ***Bra-Ket $\langle \psi |\psi\rangle = c$ - Linear Form (Covector-Vector)*** *- also: Projective Measurement*

> **Bra-Ket - Projective Measurement (Kovector-Vector-Multiplication, Born Rule)**

* Dirac delta function (a distribution!) = quantum measurement

* See also [wiki: Bra–ket notation](https://en.m.wikipedia.org/wiki/Bra–ket_notation)

* From 'Exterior algebra' $\rightarrow$ 'Multilinear Forms':

  * **A row vector can be thought of as a function (as a form), rather than a row vector, that acts on another vector.**

  * In Quantum mechanics: Linear functionals are particularly important in quantum mechanics. Quantum mechanical systems are represented by Hilbert spaces, which are [anti–isomorphic](https://en.m.wikipedia.org/wiki/Antiisomorphism) to their own dual spaces. A state of a quantum mechanical system can be identified with a linear functional. For more information see bra–ket notation.

  * Bra-Ket $\langle\psi \mid \psi\rangle$: **Kovector-Vector-Multiplication**, Born Rule (Projective Measurement)

  * ⟨0∣1⟩ und ⟨1∣0⟩ ergeben inner product 0 (orthogonal zueinander), zB $\langle 0 \mid 1\rangle=[1,0]\left[\begin{array}{l}0 \\ 1\end{array}\right] = 0$. Und ⟨0∣0⟩ und ⟨1∣1⟩ = 1.

*See also: [dot product between a covector and a vector is a scalar, after all!](https://cafephysics35698708.wordpress.com/2017/12/23/moving-away-from-ortho-land-vectors-covectors-and-all-that/) but read also why [There are a couple ways to view a dot product as a linear map by changing your view slightly](https://math.stackexchange.com/questions/2856198/is-dot-product-a-kind-of-linear-transformation)*

* Inner Product / Bra-Ket, conjugate transpose of Ket.
You get a scalar as output.

* The Bra-Ket $\langle\psi \mid \psi\rangle$  represents the inner product in the Hilbert space

* Zur Messung von Zustaenden in einer Basis (zB ich will die Probability wissen, mit der man den State=1 erhält)

* Quantum mechanical systems are represented by Hilbert spaces, which are anti–isomorphic to their own dual spaces.

* **A state of a quantum mechanical system can be identified with a linear functional**.

> $\mathbf{u}^{\top} \mathbf{v}=\left[\begin{array}{llll}u_{1} & u_{2} & \cdots & u_{n}\end{array}\right]\left[\begin{array}{c}v_{1} \\ v_{2} \\ \vdots \\ v_{n}\end{array}\right]=\left[u_{1} v_{1}+u_{2} v_{2}+\cdots+u_{n} v_{n}\right]$ = scalar

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_248.png)






**Case 1: Inner product of two basis vectors**:


* ⟨0∣1⟩ und ⟨1∣0⟩ ergeben inner product 0 (orthogonal zueinander), im Detail fur $\langle 0 \mid 1\rangle=[1,0]\left[\begin{array}{l}0 \\ 1\end{array}\right] = 0$

* ⟨0∣0⟩ und ⟨1∣1⟩ ergeben inner product 1 (mit sich selbst multipliziert / die beiden Berechnungsbasisvektoren sind orthonormal)

**Case 2: Berechne Wahrscheinlichkeit fur Messung eines Eigenstates (Born's Rule)**

* Die orthonormalen Eigenschaften sind im folgenden Beispiel nützlich

* Es gibt einen Zustand mit der Wave Function / Wahrscheinlichkeit fur beide Zustaende $|\psi\rangle=\frac{3}{5}|1\rangle+\frac{4}{5}|0\rangle$

* die Wahrscheinlichkeit des Messens 1 ist dann (1 ist hierbei die basis, weil ich es messen will) [Source](https://docs.microsoft.com/de-de/azure/quantum/concepts-dirac-notation):

>$
|\langle 1 \mid \psi\rangle|^{2}=\left|\frac{3}{5}\langle 1 \mid 1\rangle+\frac{4}{5}\langle 1 \mid 0\rangle\right|^{2}=\frac{9}{25}
$

* weil $\langle 1 \mid 0\rangle=0$ faellt der zweite Term weg: $\frac{4}{5}\langle 1 \mid 0\rangle$ = $\frac{4}{5} * 0$

* Another example: $\left[\begin{array}{ll}2 & 1\end{array}\right]\left(\left[\begin{array}{l}x \\ y\end{array}\right]\right)=2 x+1 y$. **The covector [2, 1] can be thought of as a function, rather than a row vector, that acts on another vector.**

> <font color="red">*The probability of a particular measurement is then the absolute square of the scalar product with the basis vector that corresponds to the outcome (probability of measuring $X$ is). This is <u>Born's Rule</u>. This scalar product of the wave function with a basis vector is also sometimes called a <u>projection</u> on that basis vector:*

> $|\langle X \mid \Psi\rangle|^{2}=a_{1} a_{1}^{*}$

* where $\mid \Psi\rangle$ is the wavefunction of the probability superposition and $\langle X \mid$ is the basis vector of one outcome.

**Case 3: Sum over all basis vectors**

> $\langle\Psi^* |\Psi\rangle$ = $ (a{_1}^*, a{_2}^*, a{_3}^*) \left(\begin{array}{l}a_{1} \\ a_{2} \\ a_{3}\end{array}\right)$ = $(a{_1}^*a_1 + a{_2}^*a_2 + a{_2}^*a_2)$

**The probability to get ANY measurement outcome is equal to one, which means that the sum over the squared scalar products with all basis vectors has to be one. Which is just the length of the vector (all wave functions have length 1)**:

> $1=a_{1} a_{1}^{*}+a_{2} a_{2}^{*}+a_{3} a_{3}^{*}=|\langle \Psi \mid \Psi\rangle|^{2}$

* With this bra-ket notation it's now very easy to write dot products = the **inner product between Bra and Ket which is $\langle\psi \mid \psi\rangle$ = 1**, and it is normalized the result is 1(it's a particular way of writing the 2-norm when using complex inputs).

* With the dot product of bra and ket you will get a **scalar as a result**, like total probability is 1, here for the coefficients (probability amplitudes):

> $\langle\psi^* \mid \psi\rangle = \left|a_{0}\right|^{2}+\left|a_{1}\right|^{2}= 1$

* **We expand the Ket $\psi$ in the basis u**, where the expansion coefficients c are given by the Braket between u and psi:

> $|\psi\rangle=\sum_{i} c_{i}\left|u_{i}\right\rangle = c_{1} * u_{1} + c_{2} * u_{2} .. + c_{i}* u_{i} = c_{1}\left(\begin{array}{l}1 \\ 0 \\ 0\end{array}\right)+{c_{2}}\left(\begin{array}{l}0 \\ 1 \\ 0\end{array}\right)+.. c_{i}\left(\begin{array}{l}0 \\ 0 \\ i\end{array}\right) \quad =\left\langle u_{i} \mid \psi\right\rangle$

* <font color="red">This c cofficients are what we call **representation of the Ket psi in the u basis**.

> $\left(\begin{array}{c}\langle u_{1} \mid \psi\rangle \\ \left\langle u_{2} \mid \psi\right\rangle \\ \vdots \\ \left\langle u_{i} \mid \psi\right\rangle \\ \vdots\end{array}\right)=\left(\begin{array}{c}c_{1} \\ c_{2} \\ \vdots \\ c_{i} \\ \vdots \\ \end{array}\right)$

> **Quantum Measurements from an Observable (via its Operator) - Use Case: What is the most likely Eigenvalue of an Observable like Momentum or Position?**

Video: [Measurements in quantum mechanics || Concepts](https://www.youtube.com/watch?v=u1R3kRWh1ek)

> **$\hat{A}\left|u_{n}\right\rangle=\lambda_{n}\left|u_{n}\right\rangle \longrightarrow P\left(\lambda_{n}\right)=\left|\left\langle u_{n} \mid \psi\right\rangle\right|^{2}=\left|c_{n}\right|^{2}$**

> <font color="blue">**<u>Postulate II of quantum mechanics</u>: a physical quantity $\mathcal{A}$ is associated with a hermitian operator $\hat{A}$, that is called an observable.**

* Physical quantity $\mathcal{A} \longrightarrow$ Observable $\hat{A}$

* To understand what it means to measure $\hat{A}$ in quantum mechanics, the key equation to consider is the Eigenvalue equation of the operator $\hat{A}$ here:

> $
\hat{A}\left|u_{n}\right\rangle=\lambda_{n}\left|u_{n}\right\rangle
$

<font color ="blue">*$\rightarrow$ When you apply a measurement operator in an Eigenstate, you will measure /get the Eigenvalue of this Eigenstate (postulate III of quantum mechanics). Challenge: we just dont know in which Eigenstate our system $|\Psi\rangle$ is.*</font>

* with $\lambda_{n}$ Eigenvalues and $u_{n}$ Eigenstates

> <font color="blue">**<u>Postulate III of quantum mechanics</u>: The result of a measurement of a physical quantity is one of the Eigenvalues of the associated observable.**


* This means: when we want to measure property $\hat{A}$ we first need to solve the corresponding Eigenvalue equation, which allows us to find all Eigenvalues $\lambda_{1}$, $\lambda_{2}$, .. $\lambda_{n}$

* So the operator $\hat{A}$ encodes all the possible outcomes of the measurement, irrespective of what the state of the system is.

* **So the question is: if we measure $\hat{A}$ in state $|\Psi\rangle$: which Eigenvalue will we get?** (Postulate III will tell us we will get one of the Eigenvalues $\lambda_{n}$, but not which one)

> <font color="blue">**<u>Postulate VI of quantum mechanics</u>: The measurement of $\mathcal{A}$ in a system in normalized state $|\psi\rangle$ gives eigenvalue $\lambda_{n}$ with probability:**


> $
P\left(\lambda_{n}\right)=\left|\left\langle u_{n} \mid \psi\right\rangle\right|^{2}
$ $\quad (= \left|c_{n}\right|^{2})$

<font color ="blue">*$\rightarrow$ $u_n$ are the possible Eigenstates to which we project our actual system state $|\Psi\rangle$. We square its norm and then see which one has the highest value = highest probability that the state is in this Eigenstate (with a givne Eigenvalue $\lambda_{n}$). For example we will get $\lambda_{1}$ with probability $\left|\left\langle u_{1} \mid \psi\right\rangle\right|^{2}$*</font>


* This means: for an operator $\hat{A}$ we have a list of Eigenvalues, where each of them corresponds to an Eigenstate:

  * $\lambda_{1}$ for $|u_1\rangle$,

  * $\lambda_{2}$ for $|u_2\rangle$,

  * ...

  * $\lambda_{n}$ for $|u_n\rangle$

* these relations and values come from the Eigenvalue equation of the specific operator $\mathcal{A}$ and are independent of the state of our system.

* **But when we measure $\mathcal{A}$, we measure it in a specific state $|\Psi\rangle$**. We will get $\lambda_{1}$ with $\left|\left\langle u_{1} \mid \psi\right\rangle\right|^{2}$

* So what we get as result depends (1) on the intrinsic properties $\mathcal{A}$ with its Eigenvalues and Eigenstates,  and (b) on the specific state $|\Psi\rangle$ in which our system is.

* So with postulate 4: rather then telling us the precise outcome of a measurement it tells us the probability associated with any given outcome

> $\begin{array}{cccccc} \hat{A}: \quad  & \lambda_{1} & \lambda_2 & \lambda_3 & \lambda_{n} \\ & \left|u_{1}\right\rangle & \left|u_{2}\right\rangle & \left|u_{3}\right\rangle & \left|u_{n}\right\rangle \\ |\psi\rangle: & \left|\left\langle u_{1} \mid \psi\right\rangle\right|^{2} & \left|\left\langle u_{2} \mid \psi\right\rangle\right|^{2} & \left|\left\langle u_{3} \mid \psi\right\rangle\right|^{2} & \left|\left\langle u_{n} \mid \psi\right\rangle\right|^{2}\end{array}$


**How does the state $|\Psi\rangle$ of a system encodes the possible outcomes of a measurement?**

> <font color ="blue">$\hat{A}\left|u_{n}\right\rangle=\lambda_{n}\left|u_{n}\right\rangle \longrightarrow P\left(\lambda_{n}\right)=\left|\left\langle u_{n} \mid \psi\right\rangle\right|^{2}$</font>

* We can write the state $|\Psi\rangle$ in a complete basis of our state space and the Eigenstates $u_n$ over Hermitian operator like $\hat{A}$ provide such a basis

* this means we can write $|\Psi\rangle$ in the u-Basis (Eigenstates) like this:

> $|\psi\rangle=\sum_{n} c_{n}\left|u_{n}\right\rangle$

<font color ="blue">$\rightarrow$ $\left|c_{m}\right|^{2}$ is the probability and hence $c_n$ the squared root of the probability of a given Eigenvalue for each Eigenstate $\left|u_{n}\right\rangle$ from this part here above:

> $\begin{array}{cccccc} \hat{A}: \quad  & \lambda_{1} & \lambda_2 & \lambda_3 & \lambda_{n} \\ & \left|u_{1}\right\rangle & \left|u_{2}\right\rangle & \left|u_{3}\right\rangle & \left|u_{n}\right\rangle \\ |\psi\rangle: & \left|\left\langle u_{1} \mid \psi\right\rangle\right|^{2} & \left|\left\langle u_{2} \mid \psi\right\rangle\right|^{2} & \left|\left\langle u_{3} \mid \psi\right\rangle\right|^{2} & \left|\left\langle u_{n} \mid \psi\right\rangle\right|^{2}\end{array}$

* and these expansion coefficients c which we call **the representation of $|\Psi\rangle$ in the u-Basis**, are given by the projection of $|\Psi\rangle$ onto the u-Basis states:

> $c_{n}=\left\langle u_{n} \mid \psi\right\rangle$

* This means we can rewrite the probability p of measuring Eigenvalue $\lambda_n$ as also equal to absolute value of $c_n$ squared:

> $P\left(\lambda_{m}\right)=\left|\left\langle u_{m} \mid \psi\right\rangle\right|^{2}=\left|c_{m}\right|^{2}$

<font color ="blue">$\rightarrow$ For example we will get $\lambda_{1}$ with probability $\left|\left\langle u_{1} \mid \psi\right\rangle\right|^{2}$

**What does this mean?**

* imagine our system is in state $\Psi\rangle$ and we want to measure a property $|\Psi\rangle$

> $|\psi\rangle \longrightarrow \hat{A} \longrightarrow ?$

* Then what we do is to write the state $|\Psi\rangle$ in the basis of Eigenstates of $\hat{A}$, and <font color ="blue">the expansion coefficient $c_n$ tell us the relative contribution of Eigenstate $|u_n\rangle$ to state $|\Psi\rangle$, which in turn tell us how likely it is to measure the associated Eigenvalue $\lambda_n$</font>

> $|\psi\rangle=\sum_{m} c_{m}\left|u_{m}\right\rangle$

* mit $c_{m}=\left\langle u_{m} \mid \psi\right\rangle$

> $|\psi\rangle=\sum_{m} \left\langle u_{m} \mid \psi\right\rangle\left|u_{m}\right\rangle$

* <font color="red">here you can re-arrange and get an outer product (is this true???)

> $|\psi\rangle=\sum_{m} |u_{m}\rangle \langle u_{m}| \psi\rangle$

= create outer product of a certain eigenvector to take a measurement of it

Passt zu weiter unten (video prof m mit projection operators):

> $(|\varphi\rangle\langle\psi|)|x\rangle=|\varphi\rangle(\langle\psi \mid x\rangle)=a|\varphi\rangle$

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_199.png)

Imagine we have n copies of the state $|\Psi\rangle$ and measure each. We get an Eigenvalue from the Eigenvalue equation, but with different probabilities. As p approaches $\infty$ with N copies (= $p_n$), we will reach the most probable Eigenvalue P($\lambda_m$) (postulates 4):

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_200.png)

**A special case if our system with state $|\Psi\rangle$ is in an Eigenstate of the property that we are measuring (the operator $\hat{A}$, for example momentum, positon etc), say $|u_m\rangle$**:

> $|\psi\rangle=\left|u_{m}\right\rangle$

* This corresponds to the expansion coefficient $c_m$ = 1, while all other coefficients vanish.

* In this particular case we do know with absolute certainty what the outcome of the measurement will be. Probability = 1:

> $P\left(\lambda_{m}\right)=\left|c_{m}\right|^{2}=1$

**This means: If the system is in an Eigenstate of the property that we are measuring (=momentum, position etc, measured by an operator), then the outcome of the measurement is the associated Eigenvalue with probability 1.**

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_287.png)

###### ***Ket-Bra $|\psi\rangle \langle \psi | = P$ - Linear Map (Vector-Covector) - Projection Operator***

**Use Case 1: Projection Operator (Outer Product)**

**A measurement outcome is actually a projection (to the first basis vector)**

* So if we want to model that we get the outcome 0, then we take the corresponding projection | 0 X 0 | and we apply it on the quantum state: | 0 X 0 | $\Phi$> bzw. | 0 > < 0 | $\Phi$ >

* this is how you pull out samples from a quantum state and how you apply measurement to this particular probability distribution

> The probability of measuring a value with probability amplitude $\phi$ is $1 \geq|\phi|^{2} \geq 0$, where $|\cdot|$ is the [modulus](https://en.m.wikipedia.org/wiki/Absolute_value#Complex_numbers).

**Ket-Bra (also: Projection Operator, Outer Product or Density Matrix)**

* Use Case 1: used for mixed states

* Use Case 2: used when I want to measure a specific Eigenstate / Eigenvector to get its probability.

* Is a pair: Vector-Covector

> $\mathbf{u v}^{T}=\left[\begin{array}{c}u_{1} \\ u_{2} \\ \vdots \\ u_{n}\end{array}\right]\left[\begin{array}{llll}v_{1} & v_{2} & \cdots & v_{n}\end{array}\right]$= $\left[\left[\begin{array}{c}u_{1} \\ u_{2} \\ \vdots \\ u_{n}\end{array}\right] v_{1}\left[\begin{array}{c}u_{1} \\ u_{2} \\ \vdots \\ u_{n}\end{array}\right] v_{2} \ldots \left[\begin{array}{c}u_{1} \\ u_{2} \\ \vdots \\ u_{n}\end{array}\right] v_{2} \right]$ = $\left[\begin{array}{cccc}u_{1} v_{1} & u_{1} v_{2} & \cdots & u_{1} v_{n} \\ u_{2} v_{1} & u_{2} v_{2} & \cdots & u_{2} v_{n} \\ \vdots & \vdots & & \vdots \\ u_{n} v_{1} & u_{n} v_{2} & \cdots & u_{n} v_{n}\end{array}\right]$

> $|0\rangle\langle 0| =$ $\left[\begin{array}{l}1 \\ 0\end{array}\right]\left[\begin{array}{ll}1 & 0\end{array}\right]=\left[\begin{array}{ll}1 & 0 \\ 0 & 0\end{array}\right]$

* A measurement outcome is actually a projection (to the first basis vector)

* If we want to model that we get the outcome 0, then we take the corresponding projection | 0 X 0 | and we apply it on the quantum state: | 0 X 0 | $\Phi$>. This is how you pull out samples from a quantum state and how you apply measurement to this particular probability distribution

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_307.jpg)

From this video: https://youtu.be/U6fn5LvevEE

**Example: if you want to measure with which probability the quantum system is in state 0, you apply $|0\rangle \langle0|$ operator which is $\hat{P}=\left(\begin{array}{ll}1 & 0 \\ 0 & 0\end{array}\right)$, because: $|0\rangle\langle 0|=\left[\begin{array}{l}1 \\ 0\end{array}\right]\left[\begin{array}{ll}1 & 0\end{array}\right]=\left[\begin{array}{ll}1 & 0 \\ 0 & 0\end{array}\right]$**

* "Projector", "projection" and "projection operator" are the same thing. In quantum mechanics, one usually defines a projection operator as

> $
\hat{P}=|\psi\rangle\langle\psi|
$

This operator then acts on quantum states (vectors) $|\Psi\rangle$ as

> $
\hat{P}|\Psi\rangle=|\psi\rangle\langle\psi \mid \Psi\rangle=\langle\psi \mid \Psi\rangle|\psi\rangle
$

This is exactly the same as the projector you defined in matrix form, since we can think of $|\psi\rangle\langle\psi|$ as the diagonal components of a matrix. For example, if $|\Psi\rangle=\alpha|0\rangle+\beta|1\rangle$ and $|\psi\rangle=|0\rangle$, we would find that the projector projects out a particular state

> $
\hat{P}|\Psi\rangle=\alpha|0\rangle
$

In matrix form this would be exactly the same as what you defined, since now

> $
\hat{P}=\left(\begin{array}{ll}
1 & 0 \\
0 & 0
\end{array}\right), \quad|\Psi\rangle=\left(\begin{array}{l}
\alpha \\
\beta
\end{array}\right)
$

And now

> $
\hat{P}|\Psi\rangle=\left(\begin{array}{ll}
1 & 0 \\
0 & 0
\end{array}\right)\left(\begin{array}{l}
\alpha \\
\beta
\end{array}\right)=\left(\begin{array}{l}
\alpha \\
0
\end{array}\right)
$

"Projector", "projection" and "projection operator" are the same thing. In quantum mechanics, one usually defines a projection operator as

> $
\hat{P}=|\psi\rangle\langle\psi|
$

This operator then acts on quantum states (vectors) $|\Psi\rangle$ as

> $
\hat{P}|\Psi\rangle=|\psi\rangle\langle\psi \mid \Psi\rangle=\langle\psi \mid \Psi\rangle|\psi\rangle
$

This is exactly the same as the projector you defined in matrix form, since **we can think of $|\psi\rangle\langle\psi|$ as the diagonal components of a matrix**. For example, if $|\Psi\rangle=\alpha|0\rangle+\beta|1\rangle$ and $|\psi\rangle=|0\rangle$, we would find that the projector projects out a particular state

> $
\hat{P}|\Psi\rangle=\alpha|0\rangle
$

In matrix form this would be exactly the same as what you defined, since now

> $
\hat{P}=\left(\begin{array}{ll}
1 & 0 \\
0 & 0
\end{array}\right), \quad|\Psi\rangle=\left(\begin{array}{l}
\alpha \\
\beta
\end{array}\right)
$

And now

> $
\hat{P}|\Psi\rangle=\left(\begin{array}{ll}
1 & 0 \\
0 & 0
\end{array}\right)\left(\begin{array}{l}
\alpha \\
\beta
\end{array}\right)=\left(\begin{array}{l}
\alpha \\
0
\end{array}\right)
$

https://physics.stackexchange.com/questions/394258/what-is-the-standard-definition-of-projector-projection-and-projection-ope

The density matrix is a representation of a linear operator called the density operator. The density matrix is obtained from the density operator by choice of basis in the underlying space. In practice, the terms density matrix and density operator are often used interchangeably.

In operator language, a density operator for a system is a positive semidefinite, Hermitian operator of trace one acting on the Hilbert space of the system. This definition can be motivated by considering a situation where a pure state $\left|\psi_{j}\right\rangle$ is prepared with probability $p_{j}$, known as an ensemble. The probability of obtaining projective measurement result $m$ when using projectors $\Pi_{m}$ is given by

> $
p(m)=\sum_{j} p_{j}\left\langle\psi_{j}\left|\Pi_{m}\right| \psi_{j}\right\rangle=\operatorname{tr}\left[\Pi_{m}\left(\sum_{j} p_{j}\left|\psi_{j}\right\rangle\left\langle\psi_{j}\right|\right)\right]
$

which makes the density operator, defined as

> <font color="red">$
\rho=\sum_{j} p_{j}\left|\psi_{j}\right\rangle\left\langle\psi_{j}\right|
$

= probability of getting a state |0> for example with outer product / density matrix $\hat{P}=\left(\begin{array}{ll}1 & 0 \\ 0 & 0\end{array}\right)$ PLUS probability of getting state |1> etc.

<font color="red">*Projection Operator (Outer Product) & Eigenvalues in Systems of Linear Equations (HHL)*

Problem Statement:

> $A|x\rangle=|b\rangle \quad$ (System of linear equations in a quantum state)

And we want to find this:

>$
|x\rangle=A^{-1}|b\rangle \quad \text { (the solution is: }|x\rangle=\sum_{j=0}^{N-1} \lambda_{j}^{-1} b_{j}\left|u_{j}\right\rangle \text { ) }
$

We need to find the inverse matrix $A^{-1}$. We can get the matrix inverse via eigendecomposition. Since $A$ is Hermitian (normal!), it has a spectral decomposition:

> <font color="blue">$
A=\sum_{j=0}^{N-1} \lambda_{j}\left|u_{j}\right\rangle\left\langle u_{j}\right|, \quad \lambda_{j} \in \mathbb{R}
$

(if I want to know get the Eigenvector from a measurement, I create the outer product / density matrix of the Eigenvector like above. this is like a measurement)

where $\left|u_{j}\right\rangle$ is the $j^{t h}$ eigenvector of $A$ with respective eigenvalue $\lambda_{j}$. Then,

> $
A^{-1}=\sum_{j=0}^{N-1} \lambda_{j}^{-1}\left|u_{j}\right\rangle\left\langle u_{j}\right|
$

<font color="blue">**Case 1 - Bra-Ket (Inner Product)**: *I want to know the Eigenvalue of an observable (but I don't know which is the most probable Eigenvector)? Then I apply an operator to the quantum state (position operator, momentum operator etc). I will get the most probable Eigenvector / Eigenstate with an according Eigenvalue*

> $\hat{A}\left|u_{n}\right\rangle=\lambda_{n}\left|u_{n}\right\rangle \longrightarrow P\left(\lambda_{n}\right)=\left|\left\langle u_{n} \mid \psi\right\rangle\right|^{2}=\left|c_{n}\right|^{2}$

= apply a certain operator (=observable) on a quantum state and you get the Eigenvalue of this state (here u can also be considered the eigenstate / eigenvector)

<font color="blue">**Case 2 - Ket-Bra (Outer Product)**: *I want to measure a specific Eigenstate / Eigenvector to get its probability? Then I take the outer product / density matrix of the desired outcome and make a measurement on the quantum state / system (=project it onto the desired Eigenvector)*

> $|\psi\rangle=\sum_{m} c_{m}\left|u_{m}\right\rangle$

*  mit $c_{m}=\left\langle u_{m} \mid \psi\right\rangle$

> $|\psi\rangle=\sum_{m}\left\langle u_{m} \mid \psi\right\rangle\left|u_{m}\right\rangle$

* here you can re-arrange and get an outer product (true?):

> $
|\psi\rangle=\sum_{m}\left|u_{m}\right\rangle\left\langle u_{m} \mid \psi\right\rangle$

= create outer product of a certain eigenvector to take a measurement of it


* Passt zu weiter unten (video prof $m$ mit projection operators):

> $
(|\varphi\rangle\langle\psi|)|x\rangle=|\varphi\rangle(\langle\psi \mid x\rangle)=a|\varphi\rangle
$

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_250.png)

Video: [Projection operators in quantum mechanics](https://www.youtube.com/watch?v=M9V4hhqyrKQ)

> **Projection Operators project one quantum state on another or onto a subspace of the state space**

* for example: when we measure a property of a quantum particle, then the state of the particle collapses onto a different state

* the projection operator mathematically describes this collapse

> **projection operator associated with a Ket Psi: $| \Psi \rangle$ is the outer product of psi with itself: $|\Psi\rangle \langle \Psi|$**

* then we check what it does on an arbirary state phi

* the projection operator projects an arbitrary state (here Phi) onto the reference state (here Psi).

* the proportionality constant C is given by the overlap between the initial state phi and the state psi that defines the projection operator

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_264.png)

Properties of the projection operator:

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_265.png)

Eigenvalues and Eigenstates of Projection operator:

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_266.png)

The Projection operator allows us to write any Ket as a sum of two other Kets:

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_267.png)

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_268.png)

One common way in which the projection operator is used in quantum mechanics is to project onto a subspace of the whole state space.

* Most convenient way to write this down is in terms of basis states of our state space.

* For a basis u we consider a subset of basis states u1, to un which soon an n-dimensional subspace of the full state space.

* We then write the projection operator onto this n-dimensional subspace as pn

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_269.png)

Summary:

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_270.png)

**Projective measurement (PVM - Projection-valued measure)**

* See [this article](https://en.m.wikipedia.org/wiki/Measurement_in_quantum_mechanics#Projective_measurement) and longer article on [PVM](https://en.m.wikipedia.org/wiki/Projection-valued_measure)

* The eigenvectors of a von Neumann observable form an orthonormal basis for the Hilbert space, and each possible outcome of that measurement corresponds to one of the vectors comprising the basis. A density operator is a positive-semidefinite operator on the Hilbert space whose trace is equal to 1.

> <font color ="red">**For each measurement that can be defined, the probability distribution over the outcomes of that measurement can be computed from the density operator. The procedure for doing so is the Born rule, which states that**</font>

> $
P\left(x_{i}\right)=\operatorname{tr}\left(\Pi_{i} \rho\right)
$

* where $\rho$ is the density operator, and $\Pi_{i}$ is the [projection operator](https://en.m.wikipedia.org/wiki/Projection_(linear_algebra)) onto the basis vector corresponding to the measurement outcome $x_{i}$.

* The average of the eigenvalues of a von Neumann observable, weighted by the Born-rule probabilities, is the [expectation value](https://en.m.wikipedia.org/wiki/Expectation_value_(quantum_mechanics)) of that observable. For an observable $A$, the expectation value given a quantum state $\rho$ is

>$
\langle A\rangle=\operatorname{tr}(A \rho)
$

* A density operator that is a rank-1 projection is known as a pure quantum state, and all quantum states that are not pure are designated mixed. Pure states are also known as wavefunctions.

###### ***Ket-Bra $|\psi\rangle \langle \psi |$ - Mixed States (Density Matrix)***

***Use Case 2: Mixed States (Density Matrix)***

Die [Quantenstatistik](https://de.m.wikipedia.org/wiki/Quantenstatistik) wendet zur Untersuchung makroskopischer Systeme die Methoden und Begriffe der klassischen statistischen Physik an und berücksichtigt zusätzlich die quantenmechanischen Besonderheiten im Verhalten der Teilchen.

Wie die Quantenmechanik berücksichtigt auch die Quantenstatistik die folgende doppelte Unkenntnis:

> 1. Kennt man den Zustand eines Systems genau – liegt also ein **reiner Zustand (pure state)** vor – und ist dieser kein Eigenzustand der Observablen, so kann man den Messwert einer Einzelmessung dennoch nicht exakt vorhersagen.

* A pure state can be written in terms of a Ket state $|\psi\rangle$ $\doteq$ $\left[\begin{array}{l}a_{0} \\ a_{1}\end{array}\right]$

  * <font color ="blue">$|\psi\rangle$</font> $=\sum_{i} c_{i}\left|\varphi_{i}\right\rangle$

  * Spin up or spin down = **pure states**, can be written as a single wave function $\mid \psi\rangle$.

> 2. Kennt man den Zustand des Systems nicht genau, so muss von einem **gemischten Zustand (mixed state)** ausgegangen werden. Ebenso: when one wants to describe a physical system which is entangled with another, as its state can not be described by a pure state. For mixed states with noise in qubits.

* A mixed state can be described with a density matrix:

  * <font color ="blue">$\rho$ </font>$=\sum_{i} p_{i}\left|\varphi_{i} \times \varphi_{i}\right|$
  * Sum over probabilities times the corresponding projection operators onto certain basis states $\varphi_{i}$

  * It allows for the calculation of the probabilities of the outcomes of any measurement performed upon this system, using the Born rule.

**There are two more ways to distinguish a pure state from a mixed state:**

1. Take the trace of the square of the density matrix:

  * $\operatorname{Tr}\left[\rho^{2}\right]=1 \rightarrow$ Pure state (like Summe der Eigenwerte entlang der Spur: 1-0-0-0..)

  * $\operatorname{Tr}\left[\rho^{2}\right]< 1 \rightarrow$ Mixed state. <font color="red">But in the examples below the trace is always = 1? (and some off-diagonal elements)</font>

2. Geometricaly, pure states lie on the surface of the Blochsphere. Mixed states are confined within the Bloch sphere.

> $
\begin{array}{c|c|c}
& \begin{array}{c}
\text { Pure } \\
\text { State }
\end{array} & \begin{array}{c}
\text { Mixed } \\
\text { State }
\end{array} \\
\hline \begin{array}{c}
\text { Density } \\
\text { Matrix } \\
|\psi\rangle\langle\psi|
\end{array} & \color{green} ✔ & \color{green}✔ \\
\hline \begin{array}{c}
\text { Wave } \\
\text { Function } \\
|\Psi\rangle
\end{array} & \color{green}✔ & \color{red} ❌
\end{array}
$

**Why do we need this alternative formalism, these density matrices?**

* To illustrate the difference, think about this Ket $|\psi\rangle$, which is the equal superposition of 0 and 1. We can write out the vector form $\left[\begin{array}{l}1 / \sqrt{2} \\ 1 / \sqrt{2}\end{array}\right]$. And if we write the corresponding $\rho$, it will have 0.5 for every element in the matrix.

> $|\psi\rangle=\frac{1}{\sqrt{2}}(|0\rangle+(1\rangle)=\left[\begin{array}{l}1 / \sqrt{2} \\ 1 / \sqrt{2}\end{array}\right] \rightarrow $$\rho=\left[\begin{array}{ll}0.5 & 0.5 \\ 0.5 & 0.5\end{array}\right]$

* On the other hand if we create the uniform distribution over the density matrix corresponding to the 0-Ket and the density matrix corresponding to the 1-Ket, then this density matrix will be different:

> $\rho^{\prime}=\frac{1}{2}(|0 \times 0|+|1 \times 1|)=\left[\begin{array}{ll}0.5 & 0 \\ 0 & 0.5\end{array}\right]$

* It doesn't have off-diagonal elements. These off-diagonal elements are critical for many quantum operations, sometimes also called 'coherences'. This is then called a maximally mixed state = equivalent of a uniform distribution in classical probability theory. This means we have absolutely no predictive power of what's going to happen next. Entropy of the state is maximal.

* Ideally we want quantum states with a high coherence, but in reality noise affects, and these coherences disappear.

https://www.youtube.com/watch?v=BE8RxAESx5I&list=PLBn8lN0Dcvpla6a6omBni1rjyQJ4CssTP&index=47


**Explanation 1: Density Matrix for Mixed States**

*Source here point 2.1: https://qiskit.org/textbook/ch-quantum-hardware/density-matrix.html*

* Let's consider the case where we initialize a qubit in the |0⟩ state, and then apply a Hadamard gate.

* **Now, unlike the scenario we described for pure states, this Hadamard gate is not ideal: Due to errors in the quantum-computer hardware, only 80% of the times the state is prepared**, this Hadamard gate produces the desired state:

> $\left|\psi_{1}\right\rangle=\frac{1}{\sqrt{2}}(|0\rangle+|1\rangle)$

* The remaining $20 \%$ of the times, the pulse applied to rotate the state is either too short or too long by $\frac{\pi}{6}$ radians about the $x$-axis. This means that when we use this Hadamard gate, we could end up with following two undesired outcome states:

>$
\left|\psi_{2}\right\rangle=\frac{\sqrt{3}}{2}|0\rangle+\frac{1}{2}|1\rangle, \quad\left|\psi_{3}\right\rangle=\frac{1}{2}|0\rangle+\frac{\sqrt{3}}{2}|1\rangle
$

* The figure below shows the Bloch representation for the three possible states our qubit could take if the short pulse happens 10% of the time, and the long pulse the remaining 10% of the time:


![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_198.png)

* Since we do not know the outcome of our qubit everytime we prepare it, we can represent it as a mixed state of the form:

**Step 1: how do we get the density matrix? - Take the column vector, turn it into a row vector, and then multiply them both:**

> $(\hat{\rho})$ $=\left(\begin{array}{l}1 \\ 0\end{array}\right) \times\left(\begin{array}{ll}1 & 0\end{array}\right)$ = $\left(\begin{array}{ll}1 & 0 \\ 0 & 0\end{array}\right)$

in this case:

> $
\rho_{H}=\frac{4}{5}\left|\psi_{1}\right\rangle\left\langle\psi_{1}\left|+\frac{1}{10}\right| \psi_{2}\right\rangle\left\langle\psi_{2}\left|+\frac{1}{10}\right| \psi_{3}\right\rangle\left\langle\psi_{3}\right|$

* Here, the factors $\frac{4}{5}, \frac{1}{10}$ and $\frac{1}{10}$ correspond to the classical probabilities of obtaining the states $\left|\psi_{1}\right\rangle,\left|\psi_{2}\right\rangle$ and $\left|\psi_{3}\right\rangle$, respectively.

**Step 2: Add the matrices together**: By replacing each of these three possible state vectors into  $\rho$ , we can find the density matrix that represents this mixture:

> $\rho_{H}=\frac{4}{5}\left[\begin{array}{ll}\frac{1}{2} & \frac{1}{2} \\ \frac{1}{2} & \frac{1}{2}\end{array}\right]+\frac{1}{10}\left[\begin{array}{cc}\frac{3}{4} & \frac{\sqrt{3}}{4} \\ \frac{\sqrt{3}}{4} & \frac{1}{4}\end{array}\right]+\frac{1}{10}\left[\begin{array}{cc}\frac{1}{4} & \frac{\sqrt{3}}{4} \\ \frac{\sqrt{3}}{4} & \frac{3}{4}\end{array}\right]$
$\rho_{H}=\left[\begin{array}{cc}\frac{1}{2} & \frac{\sqrt{3}}{20}+\frac{2}{5} \\ \frac{\sqrt{3}}{20}+\frac{2}{5} & \frac{1}{2}\end{array}\right]$

**Explanation 2: Density Matrix for Mixed States**

*Source: Parth G: https://www.youtube.com/watch?v=ZAOc4eMTQiw*

* **Pain Point**:  when we see this notation for two states spin up and spin down: $|\psi\rangle=\frac{1}{\sqrt{2}}|\uparrow\rangle+\frac{1}{\sqrt{2}}|\downarrow\rangle$ we think of it as being in a superposition of both states. But sometimes we don't know if the system is actually in this superposition, or not already collapsed to one.

* So, in practice before we measure a system, it can be in following three states, because we lack information about it. This is a mixed state:

  * 33% probability: $|\psi\rangle=\mid \uparrow>$

  * 33% probability: $|\psi\rangle=\mid \downarrow>$

  * 33% probability: $|\psi\rangle=\frac{1}{\sqrt{2}}|\uparrow\rangle+\frac{1}{\sqrt{2}}|\downarrow\rangle$

* **Mixed state**: lack of information about the system, hence for example giving equal probability to each possible outcome.

* **When dealing with a mixed state we need to represent it with what's known as a density operator or density matrix**.

* Wave function notation $|\psi\rangle$ can only be used to describe pure states. But density matrices $|\psi\rangle\langle\psi|$ can be used to describe mixed and pure states.

* and how do we get the density matrix? - Take the column vector, turn it into a row vector, and then multiply them both:

> $(\hat{\rho})$ $=\left(\begin{array}{l}1 \\ 0\end{array}\right) \times\left(\begin{array}{ll}1 & 0\end{array}\right)$ = $\left(\begin{array}{ll}1 & 0 \\ 0 & 0\end{array}\right)$

On top of that you can even add several mixed states to get one final mixed states. Mixed state: our knowledge of the system is limited. It could be either one of multiple psi states.

* Image 1: We have to add up the density matrices of each possible spin state whilst making sure that we weight it with the probability of it being in that psi state

* Image 2: That would give us the final density matrix, which is the mixture in the mixed state.

* Image 3: it is for this reason that there is no way to write a mixed state as a single wave function or a single vector. We have to deal with matrices that represent the different pure states in which our system could be. And each of these density matrices has to be weighted by how likely it is that our system is on that pure that.

![geometry](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_006a.png)

###### ***Ket-Bra $|\psi\rangle \langle \psi |$ - Expectation Values***

An observable in quantum mechanics is represented by a Hermitian operator. The spectral norm (or operator norm) of a Hermitian operator is equal to its largest eigenvalue in absolute value.

<font color="blue">**<u>Density matrix formalism</u> to calculate Expectation Values**


* [Expectation value](https://en.m.wikipedia.org/wiki/Expectation_value_(quantum_mechanics)): The average of the eigenvalues of a von Neumann observable, weighted by the Born-rule probabilities, is the expectation value of that observable. What is the average value of quantum measurement outcomes?

* Large number of particle interactions when we are trying to measure a total effect from an aggregation of subatomic particles: run an experiment involving the observation of energy levels in a type of atom N times under fixed conditions. The expectation value e of the experiment turns out to be an illegitimate energy level for that atom, but E = Ne will give the correct total energy assuming N is large enough.

* Expectation value of an observable $Q$ (like the momentum of a particle) in a (normalized) state $|\Psi\rangle$ gives the average of all possible values (weighted by their corresponding probabilities) that one may expect to observe in an experiment designed to measure $Q$ in state $|\Psi\rangle$ over many experiments where the average of all values of $Q$ will approach the expectation value $\langle\Psi|Q| \Psi\rangle$ (one set of measurement perturbs the system)

<font color="blue">**First: The expectation value of an operator $Q$ is given by $\langle Q\rangle=\langle\psi|Q| \psi\rangle$ = $\sum_{n} \lambda_{n}\left|\left\langle u_{n} \mid \psi\right\rangle\right|^{2}$(spectral decomposition)**</font>

* Given a state vector for a pure state $|s\rangle$, with its complex-conjugate transpose is $\langle s|$, the operator whose expectation value you are evaluating could be written $|A|$ and $|s\rangle$ has a component which is a $\lambda$-eigenstate for $A$.


* A mixture of states is a weighted average of such operators, and its expectation value is the weighted average of the expectation values of the pure states.

* Suppose, $\left|\psi_{1}\right\rangle,\left|\psi_{2}\right\rangle, \ldots$ be the orthonormal basis of eigenstates of observable $Q$ with corresponding eigenvalues $\lambda_{1}, \lambda_{2}, \ldots$ respectively. One can expand the given (normalized) state $|\Psi\rangle$ in the above basis as

> $
|\Psi\rangle=\alpha_{1}\left|\psi_{1}\right\rangle+\alpha_{2}\left|\psi_{2}\right\rangle+\cdots+\alpha_{n}\left|\psi_{n}\right\rangle+\ldots
$

* According to a basic postulate of Quantum mechanics the above expansion implies that upon measuring $Q$ in state $|\Psi\rangle$ the outcome of the experiment will be eigenvalue $\lambda_{1}$ with probability $\left|\alpha_{1}\right|^{2}$, eigenvalue $\lambda_{2}$ with probability $\left|\alpha_{2}\right|^{2}$ and so on. The (weighted) average of all possible values of $Q$ that can be observed in state $|\Psi\rangle$ will be the expectation value of $Q$ in state $|\Psi\rangle$:

> <font color="blue">$\sum_{n} \lambda_{n}\left|\left\langle u_{n} \mid \psi\right\rangle\right|^{2}$ = $\left|\alpha_{1}\right|^{2} \lambda_{1}+\left|\alpha_{2}\right|^{2} \lambda_{2}+\ldots =\left\langle\alpha_{1} \psi_{1}+\alpha_{2} \psi_{2}+\ldots| \, | \lambda_{1} \alpha_{1} \psi_{1}+\lambda_{2} \alpha_{2} \psi_{2}+\ldots\right\rangle =\left\langle\alpha_{1} \psi_{1}+\alpha_{2} \psi_{2}+\ldots|Q| \alpha_{1} \psi_{1}+\alpha_{2} \psi_{2}+\ldots\right\rangle =\langle\Psi|Q| \Psi\rangle$

* This is because operator $Q$ can be written as a sum over its Eigenvalues $a_i$ times projection operators onto its Eigenstates.

> $Q=\sum_{i} \lambda_{n} \left| u_{n} \rangle \langle \ u_{n}\right|$ $\quad$ with $Q\left| u_{n}\right\rangle= \lambda_{n}\left| u_{n}\right\rangle$

* By using decomposition in the definition of the expectation value, we get this result:

> $\langle Q\rangle_{\psi}=\langle\psi|Q| \psi\rangle=$ $\sum_{i} \lambda_{n}\langle\psi| u_{n} \rangle $ <font color ="orange">$\langle u_{u}| \psi \rangle$</font> $=\sum_{i} \lambda_{n}$ <font color ="blue">$\left|\left\langle u_{n} \mid \psi\right\rangle\right|^{2}$</font>

* We sum over all Eigenvalues, which are the possible outcomes of a measurement, weighted by the probabilities that this particular Eigenvalue occurs if we start in the state $\Psi$.

  * <font color ="orange">$\langle u_{n} | \psi\rangle$</font> - this is called the **transition amplitude**

  * <font color ="blue">$|\langle u_{n} | \psi\rangle|^{2}$</font> - this is called the **transition probability** (absolute square of the transition amplitude)

> $\langle\hat{A}\rangle$ $=\sum_{n} \lambda_{n} P\left(\lambda_{n}\right)$ and since: $P\left(\lambda_{n}\right)=\left|\left\langle u_{n} \mid \psi\right\rangle\right|^{2}=\left|c_{n}\right|^{2}$:

> $\langle\hat{A}\rangle$ $=\sum_{n} \lambda_{n} \left|c_{n}\right|^{2}$</font>

> $\langle\hat{A}\rangle$ $=\sum_{n} \lambda_{n} \left|\left\langle u_{n} \mid \psi\right\rangle\right|^{2}$</font>


> $|\psi\rangle \longrightarrow\langle\hat{A}\rangle_{\psi}=\langle\hat{A}\rangle=\sum_{n} \lambda_{n} P\left(\lambda_{n}\right)$ <font color ="blue">$=\sum_{n} \lambda_{n}\left|\left\langle u_{n} \mid \psi\right\rangle\right|^{2}$

* <font color ="blue">$\rightarrow$ die Summe der Eigenwerte $\lambda_{n}$ jeweils multipliziert mit der Wahrscheinlichkeit $P\left(\lambda_{n}\right) = \left|\left\langle u_{n} \mid \psi\right\rangle\right|^{2}$, den jeweiligen Eigenwert zu erhalten = Expectation Value</font>

* Beispielrechnung mit interessantem Ergebnis: Eigenwert -1 mit Amplitude 0.25 und Eigenwert +1 mit Amplitude 0.25

  * = $(-1) * (0.25)^2$ + $(+1) * (0.25)^2$

  * = $(-1) * (0,5)$ + $(+1) * (0.5)$ = -0.5 + 0.5

  * = 0 $\rightarrow$ Expectation value, aber kein erlaubtes physikalisches Ergebnis.





<font color="blue">**Second: The expectation value $\langle \psi|A| \psi\rangle$ can be evaluated as the trace of $|\psi\rangle\langle \psi| A$.**</font>

* The operator $|\psi\rangle\langle \psi|$ is an operator of trace 1, and the operator $|\psi\rangle\langle \psi|$ is an operator of trace 1.

* $\rho = |\psi\rangle \langle \psi|$ is an operator of trace=1. We can calculate the expectation value of $A$ by taking the trace over $\rho$ times $A$:

> **$\langle A\rangle_{\rho}:= \langle \psi|A| \psi\rangle = \operatorname{Tr}[\rho A] =\operatorname{Tr}[\, |\psi\rangle \langle \psi| \, A]$**</font>

* Taking the trace over an operator means to sum over the matrix elements of that operator between states of a complete basis $\operatorname{Tr}[B]=\sum_{i}\left\langle\eta_{i}|B| \eta_{i}\right\rangle$. We are adding $\rho = |\psi\rangle \langle \psi|$ and move the two complex numbers $\eta$, since they are not operators:

> $\langle A\rangle_{\rho}:=\operatorname{Tr}[\rho A]$ = $\sum_{i} {\left\langle\eta_{i}\right| \psi \rangle}{\left\langle\psi|A| \eta_{i}\right\rangle}$

* We sum over the basis states. Since $| \eta_{i} \rangle \langle \eta_{i} |$ = 1, we get $\langle\psi|A| \psi\rangle$, which is the same result from working with density matrices:

> $\sum_{i}\left\langle\psi|A| \eta_{i} \rangle \langle \eta_{i} | \psi\right\rangle$ = $\langle\psi|A| \psi\rangle=\langle A\rangle_{\psi}$


![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_195.png)

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_201.png)

###### *Eigenvalues, Observables & Operators*

> <font color="blue">**<u>Postulate II of quantum mechanics</u>: a physical quantity $\mathcal{A}$ is associated with a hermitian operator $\hat{A}$, that is called an observable.**

> <font color="blue">**<u>Postulate III of quantum mechanics</u>: The result of a measurement of a physical quantity is one of the Eigenvalues of the associated observable.**

Video: [Eigenvalues and eigenstates in quantum mechanics](https://www.youtube.com/watch?v=p1zg-c1nvwQ)

> We consider the action of $\hat{A}$ on a special Ket $|\Psi\rangle$ such that the only way in which $\hat{A}$ changes $|\Psi\rangle$ is by scaling it by a constant and we obtain $\lambda |\Psi\rangle$

The Eigenvector of an operator are those special directions in the vector space which the operator doesn't change.

The probability of each eigenvalue is related to the projection of the physical state on the subspace related to that eigenvalue.

https://en.m.wikipedia.org/wiki/Operator_(physics)#Operators_in_quantum_mechanics

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_257.png)

> [C$*$-algebras](https://en.m.wikipedia.org/wiki/C*-algebra) were first considered primarily for their use in quantum mechanics **to model algebras of physical observables**. This line of research began with Werner Heisenberg's matrix mechanics and in a more mathematically developed form with Pascual Jordan around 1933. See also [Quantum Group](https://en.m.wikipedia.org/wiki/Quantum_group).


![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_251.png)


**(Anti-)Commutators, Quantum Numbers und "Vollständiger Satz kommutierender Observablen"**

Um einen quantenmechanischen Zustand eindeutig zu charakterisieren, sind oft mehrere Observablen notwendig. Beispielsweise ist es beim Wasserstoffatom nicht ausreichend, nur die Energie anzugeben (mittels der Hauptquantenzahl n), sondern es sind zwei weitere Observablen notwendig: der Betrag des Drehimpulses (Quantenzahl l) und die z-Komponente des Drehimpuls (Quantenzahl m). Diese drei Größen bilden dann einen vollständigen Satz kommutierender Observablen.
Eine Menge von Observablen A, B, C,... bildet einen v.S.k.O., wenn eine orthonormale Basis des Zustandsraums aus gemeinsamen Eigenvektoren der Observablen existiert, und diese Basis (bis auf einen Phasenfaktor) eindeutig ist.

Solch ein Verhalten ist in der Quantenmechanik allerdings eher die Ausnahme. Die meisten Paare von Observablen lassen sich nicht gleichzeitig beliebig genau messen, was eine Konsequenz aus der heisenbergschen Unschärferelation ist. Man spricht dann auch von komplementären Observablen.

https://de.m.wikipedia.org/wiki/Vollständiger_Satz_kommutierender_Observablen

**Solving: How to find Eigenvalues and Eigenvectors?**

* For an arbitrary operator (except the identity operator) it is generally not possible to figure out the eigenvalues and eigenstates simply by inspection of how the operator acts

* More general approach needed:

	* consider basis u that is orthonormal.

	* Then we write the Eigenvalue equation in the u basis (just project both sides of the equation onto the basis states u).

	* next we insert the identity operator after the A operator on left and side

	* then write the resolution of the identity in the u basis

	* then take sum on the beginning

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_260.png)

* the equation $\sum_{j}\left(A_{i j}-\lambda \delta_{i j}\right) c_{j}=0$ is equivalent to the original Eigenvalue equation $\hat{A}|\psi\rangle=\lambda|\psi\rangle$, but now written in the u representation

* finding the eigenvalues and eigenvectors now becomes finding the lambda and c in the new equation = "matrix diagonalization"

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_261.png)

> **All we really need to do quantum mechanics is to get enough practice in diagonalizing matrices**

* we have following operator A and state Psi

* we want to find the eigenvalues and eigenvectors of the operator A

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_263.png)

<font color="blue">**Operators acting on Quantum State Vectors: Matrix-Vector-Multiplication (Single Qubit)**

*Applying a Hadamard gate to a single qubit (matrix-vector multiplication) - Simple dot product! You get a vector out.*

$H |0\rangle = \frac{1}{\sqrt{2}}\left(\begin{array}{cc}1 & 1 \\ 1 & -1\end{array}\right)\left[\begin{array}{l}1 \\ 0\end{array}\right]= \frac{1}{\sqrt{2}}\left[\begin{array}{ll}1 + 0 \\ 1 + 0\end{array}\right]=\frac{1}{\sqrt{2}}\left[\begin{array}{c}1 \\ 1\end{array}\right] = |+\rangle$

$H |1\rangle = \frac{1}{\sqrt{2}}\left(\begin{array}{cc}1 & 1 \\ 1 & -1\end{array}\right)\left[\begin{array}{l}0 \\ 1\end{array}\right]= \frac{1}{\sqrt{2}}\left[\begin{array}{ll}0 + 1 \\ 0 -1\end{array}\right]=\frac{1}{\sqrt{2}}\left[\begin{array}{c}1 \\ -1\end{array}\right] = |-\rangle$

*Diese Zustände können auch mithilfe der Dirac-Notation als Summen von |0⟩ und |1⟩ erweitert werden:*

$|+\rangle=\frac{1}{\sqrt{2}}(|0\rangle+|1\rangle)$ weil <font color="gray">wegen $|0\rangle=\left[\begin{array}{l}1 \\ 0\end{array}\right]$ und $|1\rangle=\left[\begin{array}{l}0 \\ 1\end{array}\right]$ daher:</font> $\frac{1}{\sqrt{2}}\left[\begin{array}{ll}1 + 0 \\ 0 + 1\end{array}\right]=\frac{1}{\sqrt{2}}\left[\begin{array}{c}1 \\ 1\end{array}\right]$

$|-\rangle=\frac{1}{\sqrt{2}}(|0\rangle-|1\rangle)$ weil: $\frac{1}{\sqrt{2}}\left[\begin{array}{ll}1 - 0 \\ 0 - 1\end{array}\right]=\frac{1}{\sqrt{2}}\left[\begin{array}{c}1 \\ -1\end{array}\right]$

*Zusammenfassend, das ist alles das gleiche, sieht aber komplett anders aus:*

<font color="blue">$H |0\rangle$</font> $ = \frac{1}{\sqrt{2}}\left(\begin{array}{cc}1 & 1 \\ 1 & -1\end{array}\right)\left[\begin{array}{l}1 \\ 0\end{array}\right] =\frac{1}{\sqrt{2}}\left[\begin{array}{c}1 \\ 1\end{array}\right]$ <font color="blue">$ \,\,= |+\rangle$ = $\frac{1}{\sqrt{2}}(|0\rangle+|1\rangle)$

<font color="blue">$H |1\rangle$</font>$ = \frac{1}{\sqrt{2}}\left(\begin{array}{cc}1 & 1 \\ 1 & -1\end{array}\right)\left[\begin{array}{l}0 \\ 1\end{array}\right]=\frac{1}{\sqrt{2}}\left[\begin{array}{c}1 \\ -1\end{array}\right]$ <font color="blue">$ = |-\rangle$ = $\frac{1}{\sqrt{2}}(|0\rangle-|1\rangle)$

*$|0\rangle$ und $|1\rangle$ stellen hier die Basis dar, in der die Quantenzustaende berechnet werden.* ***Man kann allerdings auch eine andere Basis waehlen, zB $|+\rangle$ und $|-\rangle$:***

$|0\rangle=\frac{1}{\sqrt{2}}(|+\rangle+|-\rangle)$ vergleiche: <font color="blue">$|+\rangle$ = $\frac{1}{\sqrt{2}}(|0\rangle+|1\rangle)$

$|1\rangle=\frac{1}{\sqrt{2}}(|+\rangle-|-\rangle)$ vergleiche: <font color="blue">$|-\rangle$ = $\frac{1}{\sqrt{2}}(|0\rangle-|1\rangle)$

*Applying an Identity Operator:*

$I |0\rangle = \left(\begin{array}{cc}1 & 0 \\ 0 & 1\end{array}\right)\left[\begin{array}{l}1 \\ 0\end{array}\right]= \left[\begin{array}{ll}1 + 0 \\ 0 + 0\end{array}\right]=\left[\begin{array}{c}1 \\ 0\end{array}\right]$

$I |1\rangle = \left(\begin{array}{cc}1 & 0 \\ 0 & 1\end{array}\right)\left[\begin{array}{l}0 \\ 1\end{array}\right]= \left[\begin{array}{ll}0 + 0 \\ 0 + 1\end{array}\right]=\left[\begin{array}{c}0 \\ 1\end{array}\right]$

*Applying a Pauli-X Operator:*

$X|0\rangle=\left[\begin{array}{ll}0 & 1 \\ 1 & 0\end{array}\right]\left[\begin{array}{l}1 \\ 0\end{array}\right]=\left[\begin{array}{ll}0 + 0 \\ 1 + 0\end{array}\right]=\left[\begin{array}{l}0 \\ 1\end{array}\right]$


<font color="blue">**Matrix-Vector-Multiplication (Multi Qubit)**

> $A \mathbf{x}=\left[\begin{array}{cccc}a_{11} & a_{12} & \ldots & a_{1 n} \\ a_{21} & a_{22} & \ldots & a_{2 n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m 1} & a_{m 2} & \ldots & a_{m n}\end{array}\right]\left[\begin{array}{c}x_{1} \\ x_{2} \\ \vdots \\ x_{n}\end{array}\right]=\left[\begin{array}{c}a_{11} x_{1}+a_{12} x_{2}+\cdots+a_{1 n} x_{n} \\ a_{21} x_{1}+a_{22} x_{2}+\cdots+a_{2 n} x_{n} \\ \vdots \\ a_{m 1} x_{1}+a_{m 2} x_{2}+\cdots+a_{m n} x_{n}\end{array}\right]$

**First create the tensor product of two operators**: For construction of the desired two-qubit gate, you need the same tensor product operation as you used for the vectors. Here where $H_1$ is a one-qubit Hadamard gate in the two-qubit space $(\hat{H} \otimes \mathbf{I})$, where Hadamard is applied only to one Qubit:

> $H_{1} \equiv H_{0} \otimes I=\frac{1}{\sqrt{2}}\left(\begin{array}{ll}1\left(\begin{array}{ll}1 & 0 \\ 0 & 1\end{array}\right) & 1\left(\begin{array}{ll}1 & 0 \\ 0 & 1\end{array}\right) \\ 1\left(\begin{array}{ll}1 & 0 \\ 0 & 1\end{array}\right) & -1\left(\begin{array}{ll}1 & 0 \\ 0 & 1\end{array}\right)\end{array}\right)=\frac{1}{\sqrt{2}}\left(\begin{array}{cccc}1 & 0 & 1 & 0 \\ 0 & 1 & 0 & 1 \\ 1 & 0 & -1 & 0 \\ 0 & 1 & 0 & -1\end{array}\right)$

**Second, apply the new 'tensored' operator on the tensor product of two state vectors**:

See the tensor product of two vectors in state $|0\rangle$:

> $|0\rangle \otimes|0\rangle = \left[\begin{array}{l}1 \\ 0\end{array}\right] \otimes\left[\begin{array}{l}1 \\ 0\end{array}\right]=$</font> $\left[\begin{array}{l}1\left[\begin{array}{l}1 \\ 0\end{array}\right] \\ 0\left[\begin{array}{l}1 \\ 0\end{array}\right]\end{array}\right]=$ <font color="blue">$\left [\begin{array}{l}1 \\ 0 \\ 0 \\ 0\end{array}\right]$

Now applying $H_1$ you mix up the first qubit states and keep the second qubit state unchanged:

> $H_{1}(|0\rangle \otimes|0\rangle)=H_{1}\left(\begin{array}{l}1 \\ 0 \\ 0 \\ 0\end{array}\right)$= <font color="blue">$\frac{1}{\sqrt{2}}\left(\begin{array}{cccc}1 & 0 & 1 & 0 \\ 0 & 1 & 0 & 1 \\ 1 & 0 & -1 & 0 \\ 0 & 1 & 0 & -1\end{array}\right) \left(\begin{array}{l}1 \\ 0 \\ 0 \\ 0\end{array}\right)$ = $\frac{1}{\sqrt{2}}\left(\begin{array}{l}1 \\ 0 \\ 1 \\ 0\end{array}\right)$</font>= $\frac{1}{\sqrt{2}}(|0\rangle \otimes|0\rangle+|1\rangle \otimes|0\rangle)$

**Another way of writing this (=apply H to one qubit only in a 1 qubit in a 2 qubit system) for a joint state of two initialized qubits $
|0\rangle \otimes|0\rangle
$ is:**

> <font color="orange">$\hat{H}\left|q_{0}\right\rangle \otimes\left|q_{1}\right\rangle$</font> = $
\mathbf{H}|0\rangle \otimes|0\rangle=\left(\frac{1}{\sqrt{2}}|0\rangle+\frac{1}{\sqrt{2}}|1\rangle\right) \otimes|0\rangle
$

The tensor product is distributive, which in this case means it acts much like multiplication:

>$
\mathbf{H}|0\rangle \otimes|0\rangle=\frac{1}{\sqrt{2}}|0\rangle \otimes|0\rangle+\frac{1}{\sqrt{2}}|1\rangle \otimes|0\rangle
$

See [Tensor_product](https://en.wikipedia.org/wiki/Tensor_product)


<font color="blue">**Matrix-Matrix-Multiplication (Kronecker / Tensor)**

Zur Kombination von Operators in einem Timestep (zB H bei qubit 1 und I bei Qubit 2). Two systems being described as a joint system.

> $\left[\begin{array}{ll}a & b \\ c & d\end{array}\right] \otimes\left[\begin{array}{ll}e & f \\ g & h\end{array}\right]=\left[\begin{array}{lll}a\left[\begin{array}{ll}e & f \\ g & h\end{array}\right] & b\left[\begin{array}{ll}e & f \\ g & h\end{array}\right] \\ c\left[\begin{array}{ll}e & f \\ g & h\end{array}\right] & d\left[\begin{array}{llll}e & f \\ g & h\end{array}\right]\end{array}\right]=\left[\begin{array}{llll}a e & a f & \text { be } & b f \\ a g & a h & b g & b h \\ c e & c f & d e & d f \\ c g & c h & d g & d h\end{array}\right]$

$H \otimes I=\frac{1}{\sqrt{2}}\left(\begin{array}{cc}1 & 1 \\ 1 & -1\end{array}\right) \otimes\left(\begin{array}{ll}1 & 0 \\ 0 & 1\end{array}\right)=\frac{1}{\sqrt{2}}\left(\begin{array}{ll}1\left(\begin{array}{ll}1 & 0 \\ 0 & 1\end{array}\right) & 1\left(\begin{array}{ll}1 & 0 \\ 0 & 1\end{array}\right) \\ 1\left(\begin{array}{ll}1 & 0 \\ 0 & 1\end{array}\right) & -1\left(\begin{array}{ll}1 & 0 \\ 0 & 1\end{array}\right)\end{array}\right)=\frac{1}{\sqrt{2}}\left(\begin{array}{rrrr}1 & 0 & 1 & 0 \\ 0 & 1 & 0 & 1 \\ 1 & 0 & -1 & 0 \\ 0 & 1 & 0 & -1\end{array}\right)$

$I \otimes H=\left(\begin{array}{ll}1 & 0 \\ 0 & 1\end{array}\right) \otimes \frac{1}{\sqrt{2}}\left(\begin{array}{cc}1 & 1 \\ 1 & -1\end{array}\right) = \frac{1}{\sqrt{2}}\left(\begin{array}{cc}1\left(\begin{array}{cc}1 & 1 \\ 1 & -1\end{array}\right) & 0\left(\begin{array}{cc}1 & 1 \\ 1 & -1\end{array}\right) \\ 0\left(\begin{array}{cc}1 & 1 \\ 1 & -1\end{array}\right) & 1\left(\begin{array}{cc}1 & 1 \\ 1 & -1\end{array}\right)\end{array}\right)=\frac{1}{\sqrt{2}}\left(\begin{array}{cccc}1 & 1 & 0 & 0 \\ 1 & -1 & 0 & 0 \\ 0 & 0 & 1 & 1 \\ 0 & 0 & 1 & -1\end{array}\right)$

$Y \otimes X=\left[\begin{array}{cc}0 & -i \\ i & 0\end{array}\right] \otimes\left[\begin{array}{ll}0 & 1 \\ 1 & 0\end{array}\right]=\left[\begin{array}{ll}0\left[\begin{array}{ll}0 & 1 \\ 1 & 0\end{array}\right] & -i\left[\begin{array}{ll}0 & 1 \\ 1 & 0\end{array}\right] \\ i\left[\begin{array}{ll}0 & 1 \\ 1 & 0\end{array}\right] & 0\left[\begin{array}{ll}0 & 1 \\ 1 & 0\end{array}\right]\end{array}\right]=\left[\begin{array}{cccc}0 & 0 & 0 & -i \\ 0 & 0 & -i & 0 \\ 0 & i & 0 & 0 \\ i & 0 & 0 & 0\end{array}\right]$

$X \otimes H=\left[\begin{array}{ll}0 & 1 \\ 1 & 0\end{array}\right] \otimes \frac{1}{\sqrt{2}}\left[\begin{array}{cc}1 & 1 \\ 1 & -1\end{array}\right]=\frac{1}{\sqrt{2}}\left[\begin{array}{cc}0 \times\left[\begin{array}{cc}1 & 1 \\ 1 & -1\end{array}\right] & 1 \times\left[\begin{array}{cc}1 & 1 \\ 1 & -1\end{array}\right] \\ 1 \times\left[\begin{array}{cc}1 & 1 \\ 1 & -1\end{array}\right] & 0 \times\left[\begin{array}{cc}1 & 1 \\ 1 & -1\end{array}\right]\end{array}\right]=\frac{1}{\sqrt{2}}\left[\begin{array}{cccc}0 & 0 & 1 & 1 \\ 0 & 0 & 1 & -1 \\ 1 & 1 & 0 & 0 \\ 1 & -1 & 0 & 0\end{array}\right] =\left[\begin{array}{cc}
0 & H \\
H & 0
\end{array}\right]$


**<font color="blue">Matrix-Matrix-Multiplication (Usual)**

> Used in serially wired gates (so NOT gates in one time step - here we use tensor product to combine them, but serial gates!)

A line in the circuit is considered as a quantum wire and basically represents a single qubit. The product of operators keeps the same dimension.


![ggg](https://upload.wikimedia.org/wikipedia/commons/thumb/0/0a/Serially_wired_quantum_logic_gates.png/500px-Serially_wired_quantum_logic_gates.png)

For example, putting the Pauli X gate after the Pauli Y gate, both of which act on a single qubit, can be described as a single combined gate C:

>$
C=X \cdot Y=\left[\begin{array}{ll}
0 & 1 \\
1 & 0
\end{array}\right] \cdot\left[\begin{array}{cc}
0 & -i \\
i & 0
\end{array}\right]=\left[\begin{array}{cc}
i & 0 \\
0 & -i
\end{array}\right]
$

>$
X \cdot X=\left(\begin{array}{ll}
0 & 1 \\
1 & 0
\end{array}\right)\left(\begin{array}{ll}
0 & 1 \\
1 & 0
\end{array}\right)=\left(\begin{array}{ll}
1 & 0 \\
0 & 1
\end{array}\right)=I
$

<font color="blue">**Hamiltonian Operator $\mathcal{H}$**

> <font color="red">**The sum of the possible outcomes of kinetic and potential energy of this entire system in quantum mechanics is referred to the Hamiltonian $\mathcal{H}$ (to calculate the lowest total energy of a two atom system)**

* der [Hamiltonoperator](https://de.m.wikipedia.org/wiki/Hamiltonoperator) (Energieoperator) ist in der Quantenmechanik ein Operator, **der (mögliche) Energiemesswerte und die Zeitentwicklung angibt = describes the total energy of a system or particle**. <font color="red">Er liefert beispielsweise die Energieniveaus des Elektrons im Wasserstoffatom.</font>

* In der Quantenmechanik wird jeder Zustand des betrachteten physikalischen Systems durch einen zugehörigen Vektor $\psi$ im Hilbertraum angegeben. Seine Zeitentwicklung wird nach der Schrödingergleichung durch den Hamiltonoperator $\hat{H}$ bestimmt:

>$
\mathrm{i} \hbar \frac{\partial}{\partial t} \psi(t)=\hat{H} \psi(t)
$

* **For every problem there is a different Hamiltonian and a different corresponding Eigenspectrum**. Spektrum: Bereich der möglichen Messwerte. Eigenvalues = stabile Energielevel = Zustande, die die Elektronen in den Orbitalen beschreiben. Siehe auch [Hamilton-Funktion](https://de.m.wikipedia.org/wiki/Hamilton-Funktion).



<font color="blue">**Time Evolution Operator $\mathcal{U}$ bzw. $\mathcal{T}$**

* Der [Zeitentwicklungsoperator](https://de.m.wikipedia.org/wiki/Zeitentwicklungsoperator) $\mathcal{U}$ bzw. $\mathcal{T}$ ist ein quantenmechanischer Operator, mit dem sich die zeitliche Entwicklung eines physikalischen Systems berechnen lässt. Siehe auch [Time evolution](https://en.m.wikipedia.org/wiki/Time_evolution)

* Der quantenmechanische Operator ist eng verwandt mit dem [Propagator](https://de.m.wikipedia.org/wiki/Propagator) in der Quantenfeld- oder Vielteilchentheorie. Üblicherweise wird er als $U\left(t, t_{0}\right)$ geschrieben und bezeichnet die Entwicklung des Systems vom Zeitpunkt $t_{0}$ zum Zeitpunkt $t$.

Der Zeitentwicklungsoperator $U\left(t, t_{0}\right)$ wird definiert über die Zeitentwicklung eines beliebigen Zustandes $|\psi\rangle$ zu einem Zeitpunkt $t_{0}$ bis zum Zeitpunkt $t$ :

>$
|\psi(t)\rangle=U\left(t, t_{0}\right)\left|\psi\left(t_{0}\right)\right\rangle \quad \forall|\psi\rangle
$

Einsetzen in die Schrödingergleichung liefert einen Satz gewöhnlicher Differentialgleichungen 1. Ordnung:

>$\mathrm{i} \hbar \frac{\partial}{\partial t} U\left(t, t_{0}\right)=H(t) U\left(t, t_{0}\right)$

Diese Gleichungen sind zur Schrödingergleichung insofern äquivalent, als sie die Erweiterung des Zeitentwicklungsoperators um einen infinitesimalen Zeitschritt $\delta t$ beschreiben:

>$
U\left(t+\delta t, t_{0}\right)=\left(1-\frac{i}{\hbar} H(t) \delta t\right) U\left(t, t_{0}\right)+O\left(\delta t^{2}\right)
$

mit dem Hamiltonoperator $H$, der den Erzeuger der Zeitentwicklungen darstellt.

<font color="blue">**Position Operator $\hat{x}$**

Der [Ortsoperator](https://de.m.wikipedia.org/wiki/Ortsoperator) gehört in der Quantenmechanik zur Ortsmessung von Teilchen.

* Der physikalische Zustand $\Psi$ eines Teilchens ist in der Quantenmechanik mathematisch gegeben durch den zugehörigen Vektor eines Hilbertraumes $\mathrm{H}$.

* Dieser Zustand wird folglich in der Bra-Ket-Notation durch den Vektor $|\Psi\rangle$ beschrieben.

* Die Observablen werden durch selbstadjungierte Operatoren auf $\mathrm{H}$ dargestellt.

Speziell ist der Ortsoperator die Zusammenfassung der drei Observablen $\hat{\mathbf{x}}=\left(\hat{x}_{1}, \hat{x}_{2}, \hat{x}_{3}\right)$, so dass

>$
E\left(\hat{x}_{j}\right)=\left\langle\hat{x}_{j} \Psi, \Psi\right\rangle_{\mathrm{H}}, \quad j=1,2,3
$

der Mittelwert (Erwartungswert) der Messergebnisse der j-ten Ortskoordinate des Teilchens im Zustand $\Psi$ ist.

<font color="blue">**Momentum Operator $\hat{p}$**

Video: [Why Momentum in Quantum Physics is Complex](https://www.youtube.com/watch?v=kG-iihrYCG4&list=WL&index=22)

Classical Momentum

> p = m * v

Quantum Mechanics: apply a measurement operator on wave function (eigenvalue equation:)

> $\hat{p}|\Psi\rangle=\lambda|\Psi\rangle$

Momentum measurement operator

> $\hat{p}=-i \hbar \frac{\partial}{\partial x}$ $\quad$ with: $\, i=\sqrt{-1}$

Der [Impulsoperator](https://de.m.wikipedia.org/wiki/Impulsoperator) $\hat{p}$ ist in der Quantenmechanik der Operator zur Impulsmessung von Teilchen. In der Ortsdarstellung ist der Impulsoperator in einer Dimension gegeben durch (mit $\frac{\partial}{\partial x}$ die partielle Ableitung in Richtung der Ortskoordinate $x$):

>$
\hat{p}_{x}=-\mathrm{i} \hbar \frac{\partial}{\partial x}=\frac{\hbar}{i} \frac{\partial}{\partial x}
$

Mit dem Nabla-Operator $\nabla$ erhält man in drei Dimensionen den Vektor:

>$
\hat{\mathbf{p}}=-\mathrm{i} \hbar \nabla
$

* Der physikalische Zustand $\Psi$ eines Teilchens ist in der Quantenmechanik mathematisch durch einen zugehörigen Vektor eines Hilbertraumes $\mathcal{H}$ gegeben. Dieser Zustand wird folglich in der Bra-Ket-Notation durch den Vektor $|\Psi\rangle$ beschrieben.

* Die Observablen werden durch selbstadjungierte Operatoren auf $\mathcal{H}$ dargestellt. Speziell ist der Impuls-Operator die Zusammenfassung der drei Observablen $\hat{\mathbf{p}}=\left(\hat{p}_{1}, \hat{p}_{2}, \hat{p}_{3}\right)$, so dass

>$
E\left(\hat{p}_{j}\right)=\left\langle\Psi\left|\hat{p}_{j}\right| \Psi\right\rangle \quad j=1,2,3
$

der Mittelwert (Erwartungswert) der Messergebnisse der $j$ -ten Komponente des Impulses des Teilchens im Zustand $\Psi$ ist.

<font color="blue">**Translation Operator $\hat{T}$**

* the translation operator is the operator that allows us to move quantum states from one point to another

* it allows us to understand many properties of wave functions:

> **the wave function in real space is related to the wave function in momentum space by a Fourier transform**

> $[\hat{x}, \hat{p}]=i \hbar$

* position $\hat{x}$ and momentum $\hat{p}$ operators. Their most important property is their commutator which is equal to $i \hbar$

* **Translation operator**:

> $\hat{T}(\alpha)=e^{-i \alpha \hat{p} / \hbar} \quad \alpha \in \mathbb{R}$

* this is an operator that translates by an amount $\alpha$

* What does it mean to have a **function of an operator**, like the exponential function here: the function of an operator is defined by its Taylor expansion

* adjoint of an operator: tells us about what the operator looks like in the dual space (NOT hermitian as you can see):

> $\begin{aligned} \hat{T}^{t}(\alpha)=& e^{i \alpha \hat{r}^{\dagger} / \hbar}=e^{i \alpha \hat{p} / \hbar}=e^{-i(-\alpha) \hat{p} / \hbar}=\hat{T}(-\alpha) \\ & \hat{p}^{+}=\hat{p} \end{aligned}$

* Let's look at the action of T dagger alpha on T alpha:

> $\hat{T}^{\dagger}(\alpha) \hat{T}(\alpha)=e^{i \alpha \hat{p} / \hbar} e^{-i \alpha \hat{p} / \hbar}=\mathbb{1}$

> $[\hat{p}, \hat{p}]=0$

* remember: in general we cannot combine exponents of operators like if they were numbers, but here we can because the two exponents commute because the P operator commutes with itself

* An operator whose adjoint is equal to its inverse is called a unitary operator

> $\left.\begin{array}{l}\hat{T}^{\dagger}(\alpha) \hat{T}(\alpha)=e^{i \alpha \hat{p} / \hbar} e^{-i \alpha \hat{p} / \hbar}=I \\ {[\hat{p}, \hat{p}]=0} \\ \hat{T}(\alpha) \hat{T}^{\dagger}(\alpha)=e^{-i \alpha \hat{p} / \hbar} e^{i \alpha \hat{p} / \hbar}=I\end{array}\right\} \hat{T}^{\dagger}(\alpha)=\hat{T}^{-1}(\alpha)$

* Overall:

> $\hat{T}^{\dagger}(\alpha)=\hat{T}^{-1}(\alpha)=\hat{T}(-\alpha)$

###### *Schrödinger Equation*

> for fast moving electrons or electrons in electro-magnetic fields, the Schroedinger equation gives the wrong answers (see quantum field theory, spinor field)

<font color="blue">**From Newton's 2nd Law of Motion to the Schrödinger Equation**

**Step 1: Starting with Newton's 2nd law of motion**

* let's assume a particle is moving along an x-axis and we apply several forces on it $\overrightarrow{F}_1$, $\overrightarrow{F}_2$, .. $\overrightarrow{F}_n$

* these forces depend on the position $x$ of the particle and the time elapsed $t$

Then we can find the particle's position $x$ as a function of time using **Newton's 2nd law**:

> $\vec{F}_{net}=\sum \vec{F}_{i}(x, t)=m \vec{a}  \quad(\text { Newton's 2nd law })$

But the law of acceleration $\vec{a}$ can be also written as the second time derivative of position, so we end up with a [governing equation](https://en.wikipedia.org/wiki/Governing_equation) like this, the **Equation of Motion**:

> $m \frac{d^{2} x}{d t^{2}}=\sum_{i=1}^{n} F_{i}(x, t) \quad(\text { Equation of Motion })$

* (another way of writing it is: $m \ddot x = -kx$, see Colab 'Variationsrechnung')

* Once we solve this equation for the particle's position, we could infer many things about the particle's state, such as its velocity, kinetic energy etc.

* *Exkurs: The governing equations of a mathematical model describe how the values of the unknown variables (i.e. the dependent variables) change when one or more of the known (i.e. independent) variables change*

**Step 2: Schrödinger Equation & Operator /Functionals**
* The goal of quantum mechanics is to solve the Schrödinger Equation, which is very similar conceptually

> $i \hbar \frac{\partial \Psi}{\partial t}=\frac{-\hbar^{2}}{2 m} \frac{\partial^{2} \psi}{\partial x^{2}}+V \psi$


* in contrast to classical equation of motion, where we solve for position and then get velocity, kinetic energy etc about particle's state, **in the Schrodinger equation we solve for wavefunction** $\psi$! (because that is kind of it's position :)

  * $\frac{-\hbar^{2}}{2 m} \frac{\partial^{2}}{\partial x^{2}}$ this is the Kinetic energy operator $L$ on the wavefunction $\psi$

  * $V$ represents the potential energy operator

<font color="red">**In quantum mechanics we use the term operator because you generally don't get fixed numerical values for kinetic and potential energy.**</font>

* Instead you need to perform operations on the wavefunction to extract those kinetic and potential energy values. That's what these operators do.

* You can think of Schrodinger equation as a statement of energy conservation: kinetic energy + potential energy = total energy (which is on the left side of the v)

**Step 3: Wavefunction & (Dirac) Delta Function**

* the wavefunction represents the state of a system - related to the probability of finding a particle at a particular region in the domain which it occupies

* the square of the norm of the wavefunction gives the probability density function of a particle.

* if you integrate this norm squared over the entire domain you will get 1

> $\int_{-\infty}^{\infty}|\Psi|^{2} d x=1$

* you can't tell exactly where the position of a particle is before measurement, just a probability, same for velocity, momentum, kinetic energy etc.

* But if you **apply the measurement operator, you change the wavefunction**! after one measurement, or further measurements will always get you the same result

**So instead of being a probability distribution that covers multiple values, <font color="red">by taking a measurement I change the wavefunction to a [delta function](https://de.wikipedia.org/wiki/Delta-Distribution) with one spike at what my measurement gave me. If I take more measurement on the same system, the delta function doesnt change.</font> This is called the wavefunction collapse.**

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_174.png)

* However, if I let the system settle so that it eventually occupies its original wavefunction it had and then if I take my measurement, I might get something different according to the probability distribution corresponding to my original wavefunction

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_175.png)

**Step 4: Partial Differential Equation**

> $i \hbar \frac{\partial \Psi}{\partial t}=\frac{-\hbar^{2}}{2 m} \frac{\partial^{2} \psi}{\partial x^{2}}+V \psi$

* when solving a partial differential equation we also need auxiliary conditions in order to determine the unknown constants that you get from the integration process that's inherent in solving a differential equation

* auxiliary conditions = initial conditions and boundary conditions

* however the Schrodinger equation doesn't come with your typical boundary conditions that you might be used to seeing

**Instead,the auxiliary condition we have on our solution is the normalization constraint: $\int_{-\infty}^{\infty}|\Psi|^{2} d x=1$**

* Solutions to the Schrodinger equation have two be normalizable because they are wavefunctions, and in a way they represent probability density functions

	* if the solutions to Schrodinger's equation are not normalizable, we can't use them to represent a physical system

	* the trivial solution $\psi$ (x,t) = 0 is not normalizable, because its integral from negative infinity to infinity will always be 0, it can never be 1, which is why $\psi$ (x,t) = 0 is an unphysical solution, because the particle has to be somewhere

**Step 5: Solving Schrodinger's Equation under the normalization condition**

Let's say we solve Schrodinger's equation and we get following solution:

> $\Psi (x,t) = A f(x,t)$ with $A$ being an arbitrary constant

* The process of normalization is to find the value of the constant $A$ so that the solution obeys the normalization condition:

> $\int_{-\infty}^{\infty}|\psi|^{2} d x=\int_{-\infty}^{\infty}|A f|^{2} d x = 1$

* we might end up with a difficult task if the wavefunction is dependent on time, which means it has a different shape for different times, then wouldn't the normalization constant change with time as well?

* the answer is NO!

**Theorem**: If you normalize the wavefunction once then you don't need to normalize for other times = the normalization stays preserved !

**Proof**:

* the norm squared of a wavefunction is just the complex conjugate of that wavefunction time the wavefunction: $|\psi|^{2}=(\psi)^{*} \psi$

* if we go back to the Schrodinger equation we would see that the complex conjugate of the equation would be something like this with all they size turned into their conjugates and all the imaginary terms with their signs switched:

> $i \hbar \frac{\partial \Psi}{\partial t}=\frac{-\hbar^{2}}{2 m} \frac{\partial^{2} \psi}{\partial x^{2}}+V \psi$

> $-i \hbar \frac{\partial \psi^{*}}{\partial t}=\frac{-\hbar^{2}}{2 m} \frac{\partial^{2} \psi^{*}}{\partial x^{2}}+V \psi^{*}$

* the goal of the proof is to show that this normalization integral $\int_{-\infty}^{\infty}|\psi|^{2} d x=$ const doesn't change with time, it stays the same:

* the way to do this is to take the time derivate: if we can show that the time derivative of the normalization integral is 0 then our proof is complete $\frac{d}{a t}\left[\int_{-\infty}^{\infty}|\varphi|^{2} d x\right]=0$

* see complete proof in [video](https://www.youtube.com/watch?v=kUm4q0UIpio&list=WL&index=72&t=651s)

<font color="blue">**Schrödinger Equation (Time Independent / Eigenvalue Equation)**</font>

Time independent = Total energy of a system does NOT change with time

**The Schroedinger equation can be written as a type of Eigenvalue equation**

> $\hat{H}|\psi\rangle= -i \hbar \frac{d}{d t}|\psi\rangle =\frac{\hbar}{i} \frac{d}{d t}|\psi\rangle$

**Simplified (when system is not changing over time: time-independent Schroedinger equation):**

> $\hat{H}|\psi\rangle=E |\psi\rangle$

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_151.png)

> $E \Psi(x)=\frac{-\hbar^{2}}{2 m} \frac{d^{2} \Psi(x)}{d x^{2}}+V \Psi(x)$

* E = **Energy the electron** is allowed to have

* $\Psi$ = **Wavefunction** (most likely position of an electron)

* **Kinetic energy**: $\frac{-\hbar^{2}}{2 m} \frac{d^{2} \Psi(x)}{d x^{2}}$ (klassische Form: $K E=\frac{1}{2} m v^{2}$)

* **Potential energy**: $V \Psi(x)$


*What would a typical Schroedinger solution look like? - All the solutions to the wave function take these two forms:*

> $\Psi(x)=\sqrt{\frac{2}{L}} \cos \left(\frac{\pi n x}{L}\right)$ when $n=1,3,5 \ldots$ (is odd $)$

> $\Psi(x)=\sqrt{\frac{2}{L}} \sin \left(\frac{\pi n x}{L}\right)$ when $n=2,4,6 \ldots$ (is even)

*Now looking at $\psi$, the probable position of an electron:*

* central question: where is the electron?

* n is the energy state / level of an electron (look above at quantum numbers)

* When an electron is state n=1 (its first energy state) we apply the first formula: $\Psi(x)=\sqrt{\frac{2}{L}} \cos \left(\frac{\pi n x}{L}\right)$

* then we get wave function for the electron that is in a given box in this case:

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_147.png)

* And if we square it, we get the probability distribution (the probable position of an electron):

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_148.png)


* And here some wave functions and probability densities for other energy states:

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_149.png)

*And the solution that popped out was this:*

> $E=\frac{\hbar^{2} n^{2} \pi^{2}}{2 m L^{2}}$

* Everything is a constant ($\hbar$, $\pi$, 2, m, L) or a whole number (here: n, which stands for the different states of an electron)

* which means that energy E can ony have certain discrete (=quantum) values



<font color="blue">**How to solve the Schrödinger Equation (Time Independent)**

Assumption: particle is moving on one line between 0 and a and limited by infinite blocks left and right, where its probability is 0 and the potential energy infinite = this means that the particle can only be found on the line between 0 and a.

[SOLVING the SCHRODINGER EQUATION | Quantum Physics by Parth G](https://www.youtube.com/watch?v=sPZWtZ8vt1w)

**Step 1: Take Schrodinger Equation, remove $V$ and focus on differential equation of kinetic energy term**

* now we want to compute the wavefunction of this particle with the (Time Independent) Schrödinger Equation

> $\frac{-\hbar^{2}}{2 m} \frac{d^{2} \Psi}{d x^{2}}+V \Psi=E \Psi$

* the potential energy V is zero between 0 and a because nothing influences the particle, so it becomes this **differential equation** that we need to solve:

> $\frac{-\hbar^{2}}{2 m} \frac{d^{2} \Psi}{d x^{2}}=E \Psi$

* $\frac{d^{2} \Psi}{d x^{2}}$ is second derivative of $\Psi$ with respect to x

We want to solve this equation:

> $\frac{-\hbar^{2}}{2 m}$ <font color="red">$ \frac{d^{2} \Psi}{d x^{2}}$</font> $=E \Psi$

* Normally try to solve <font color="red">$ \frac{d^{2} \Psi}{d x^{2}}$</font> which is second derivative of $\Psi$ with respect to x, when you know what $\Psi$ is,

* but we don't. We want to go the other way around, which is trickier.

**Step 2: Rearrange the constants**

Luckily we have two constants in our equation (blue):

> <font color="blue">$\frac{-\hbar^{2}}{2 m}$</font> $ \frac{d^{2} \Psi}{d x^{2}}$ = <font color="blue">$E$</font> $\Psi$

Which means we can rearrange the equation to this:

> $ \frac{d^{2} \Psi}{d x^{2}}$ = <font color="blue">$\frac{-2m E}{\hbar^{2}}$</font>  $\Psi$

Now we can combine all constants in one constant $-k^2$ = $\frac{-2m E}{\hbar^{2}}$

> $ \frac{d^{2} \Psi}{d x^{2}}$ = <font color="blue">$-k^2$</font>  $\Psi$

**Step 3: Identify suitable function for this equation**

* So which type of function obeys this relation $ \frac{d^{2} \Psi}{d x^{2}}$ = <font color="blue">$-k^2$</font>  $\Psi$? - Would be a [sinusoid](https://de.wikipedia.org/wiki/Sinusoid)!

* $\frac{d^2 y}{d x^{2}}=-y$ - when you start with a sine and differentiate it twice you still end up with a sinusoidal term

* so if we carefully account for the constants in our equation, our solution is going to look like a sinusoid:

> $\Psi$ = $\sin \left(\frac{\sqrt{2 m E}}{\hbar} x\right)$ and replacing <font color="blue">$\frac{\sqrt{2 m E}}{\hbar}$</font> with $k$ $\rightarrow$ $\Psi$ = $\sin ($ <font color="blue">$k$</font> $x)$

*Compare this with before (above is no minus and root taken is first term):*

> $ \frac{d^{2} \Psi}{d x^{2}}$ = <font color="blue">$\frac{-2m E}{\hbar^{2}}$</font>  $\Psi$ =  <font color="blue">$-k^2$</font>  $\Psi$

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_176.png)

**Step 4: Work our the boundaries to solve the differential equation, which means to get wavefunction**


**When x = 0 $\rightarrow$ $\Psi$ = 0**

* so at the wall at point a $\Psi$ = 0, so that we first derivative is not getting infinite!

* this works quite nice with sinusoidal, since $\Psi = sin(kx)$ $\rightarrow$ 0 = sin(k(0)) = 0

**When x = a $\rightarrow$ $\Psi$ = 0**

* 0 = sin(k(a)) since $\Psi$ = sin(kx) $\rightarrow$ 0 = sin(k(a)) = 0

* we essentially find a restriction on the kind of sine wave that we can have as a solution

* for example half a sine wave is a possible solution

  * it's y=0 at point x=0 and x=a (at the walls), so $sin(ka) = 0$

  * so we went through half a sine wave which means that this part in brackets (ka) must be equal to 180 degrees (because that's half a sine wave) $y = \frac{1}{2}sin(x)$

  * and if we use radians instead of degrees, which is the other unit of measuring angles, and a much more natural unit of measuring angles, then 180 degrees is actually equal to <font color="blue">$\pi$ radians = (ka)</font>

  * So: $y = \frac{1}{2}sin(x) = \pi$

* this means that this equation holds true if our wavefunction is half a sine wave <font color="blue">$(ka)$ = $\pi$ = $\frac{\sqrt{2m E}}{\hbar}a$</font> , recall: k = $\frac{\sqrt{2m E}}{\hbar}$

* from earlier: $\Psi$ = $\sin \left(\frac{\sqrt{2 m E}}{\hbar} x\right)$ =  $\sin ($ <font color="blue">$k$</font> $x)$

**Step 5: Rearrange that equation to get the energy value**


* and if we rearrange that <font color="blue">$(ka)$ = $\frac{\sqrt{2m E}}{\hbar}a$ = $\pi$ </font> we have something that tells us the value of the energy $E$:

> $E=\frac{h^{2} \pi^{2}}{2 m a^{2}}$

* (with reduced Planck constant: $\hbar=\frac{h}{2 \pi} =1.054571817 \ldots \times 10^{-34} \mathrm{~J} \cdot \mathrm{s}$)

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_178.png)

* in other words: if our wavefunction looks like this (half a sine function), then the energy of our particle is this $E=\frac{h^{2} \pi^{2}}{2 m a^{2}}$

* another possibe solution is a full sine wave fitting into this region, just that the value at the end of the wall is 360 degrees, because we went through the whole sine wave = 2*$\pi$ radians

  * if the wavefunction looks like a whole sine wave <font color="blue">$\frac{\sqrt{2m E}}{\hbar}a$</font> = 2*$\pi$

  * then the  energy of the particle is $E=\frac{4h^{2} \pi^{2}}{2 m a^{2}}$

* We can continue doing this for lots of half sine waves, so we could have three or four half fine waves in our region - and in each case we can calculate the energy of a particle when its wavefunction looks like those sine waves.

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_177.png)

* This phenomenom is called "Quantization".

  * because we can only have specific wave functions, and they correspond to specific energies, a particle can therefore only have specific energies

  * so it cannot be anyhting in between and it cannot be less than the minimum of half a sine wave $E_{1}=\frac{h^{2} \pi^{2}}{2 m a^{2}}$

  * this is also why for this particular setup cosine doesnt work (normally it does though) $\Psi=\cos \left(\frac{\sqrt{2 m E}}{\hbar} x\right)$

**Step 6: Normalization to get probabilities**

*Normalization of the wavefunction*

* there is one more thing to consider when finding a solution to the Schrodinger equation: Normalization

> $\Psi = \sqrt{\frac{2}{a}} \sin \left(\frac{\sqrt{2 m E}}{\hbar} x\right)$

* it adds a factor of $\sqrt{\frac{2}{a}}$ to our solution

* physical meaning: if our particle is in the lowest energy level $E_1$. Then in our specific setup with the two walls the wavefunction looks like half a sine wave. And remember the wavefunction corresponds directly to the probability of us finding that particle at a particular point in space. And this relationshipn is if we square our wavefunction $|\Psi|^2$ (we take the square modulus), then we get the probability

<font color="blue">**Schrödinger Equation (Time Dependent)**

**Consider: Difference of probability in the position basis (changes over time) and the energy basis (doesn't change):**

> <font color="red">**The electron is still in the same shell, represented by the principal quantum number for example, because if the electron changes the shell, energy needs to be added or removed from the overall system. however if energy stays the same, it means the electron is still in the same shell, but "moving" around = probability distribution of finding it somewhere in this shell changes over time which is represented by the rotation $e^{i \frac{\hat{H} * t}{\hbar}}$**</font>\

>$
\left[-\frac{\hbar^{2}}{2 m} \nabla^{2}+V(\vec{r})\right] \psi(\vec{r})=E \psi(\vec{r})
$

The object on the left that acts on $\psi(x)$ is an example of an operator.

>$
\left[-\frac{\hbar^{2}}{2 m} \nabla^{2}+V(\vec{r})\right]
$ = Operator

In effect, what is says to do is "take the second derivative of $\psi(x)$, multiply the result by $-\left(\hbar^{2} / 2 m\right)$ and then add $V(x) \psi(x)$ to the result of that."

Quantum mechanics involves many different types of operators. This one, however, plays a special role because it appears on the left side of the Schrödinger equation. **It is called the Hamiltonian operator and is denoted as**

> $
\hat{H}=-\frac{\hbar^{2}}{2 m} \nabla^{2}+V(\vec{r})
$

**Therefore the time-dependent Schrödinger equation can be written as**:

> $
\hat{H} \psi(x, t)=i \hbar \frac{\partial}{\partial t} \psi(x, t)
$

with $\hat{H}$ = $(-\frac{\hbar^{2}}{2 m} \nabla^{2}+V(\vec{r}))$ will be:

> $
(-\frac{\hbar^{2}}{2 m} \nabla^{2}+V(\vec{r})) \; \psi(x, t)=i \hbar \frac{\partial}{\partial t} \psi(x, t)
$

bzw. rewritten:

> $\left[-\frac{\hbar^{2}}{2 m} \frac{\partial^{2}}{\partial x^{2}}+V(x, t)\right] \Psi(x, t) = i \hbar \frac{\partial}{\partial t} \Psi(x, t)$

bzw written in another way (single particle variant):

> <font color="red">$[-\frac{\hbar^{2}}{2 m} \nabla^{2} $</font> + <font color="green">$V(x, t)$</font> ]
 <font color="blue">$|\psi\rangle$</font> = $i \hbar \frac{\partial}{\partial t}$ <font color="blue">$|\psi\rangle$</font>

* <font color="red">$[-\frac{\hbar^{2}}{2 m} \nabla^{2} $</font> Kinetic energy

* <font color="green">$V(x, t)$</font> Potential energy

* <font color="blue">$|\psi\rangle$</font>  the wave function

<font color="blue">*Schrodinger equation: Derivation and how to use it (in Time Evolution)*

Important rules of physics:

* Conservation of energy -> deeply integrated into Schrodinger equation
* total energy doesn't change
* you can't make of destroy energy

**Since we can write a quantum state $|\Psi \rangle$ in whatever basis we want, we can choose the energy Eigenbasis**. <font color="red">what is meant by that? is that the principle quantum number for example</font>

* We can write a state as the superposition of different energies.

* And if we measure the energy of the particle it will be one of these with their probability

> $|\Psi \rangle$ = $\alpha |\Psi \rangle + \beta |\Psi \rangle + \gamma |\Psi \rangle$

* with probability for example $|\beta|^2$ for measuring second state

**Say the state evolves in time, in other words we apply the time evolution $U$ (or $T$) to $|\Psi \rangle$, so $T |\Psi \rangle$**

* what condition do we want to impose on the new energies of the state?

* In other words: how we want conservation of energy to look in quantum mechanics?

**Let's start where a particle just has one energy $E$ when we start**

* means: it is an energy Eigenstart !!

* we evolve it forward in time and look at the energy of the new state. That energy should be also $E$, otherwise energy wouldn't be conserved (Like in classical mechanics).

> $|\Psi \rangle$ = $|1 \rangle$ $\rightarrow$ $T|\Psi \rangle$

* Now also the average energy shouldn't change after some time, otherwise the energy wouldn't be conserved either.

> $|\Psi \rangle$ = $\alpha |\Psi \rangle + \beta |\Psi \rangle + \gamma |\Psi \rangle$

* if you measured athe particle's energy initially with a certain probability $|\beta|^2$, and then after time evolution again, it should be the same probability to measure that energy!

* this is so strong, it gives us the schroedinger equation


**We need to how the coefficients have changed in the new equation after time evolution**:

> $|\Psi \rangle$ = $\alpha |\Psi \rangle + \beta |\Psi \rangle + \gamma |\Psi \rangle$ (before)

> $T |\Psi \rangle$ = $\alpha' |\Psi \rangle + \beta' |\Psi \rangle + \gamma' |\Psi \rangle$ (after)

* We want the probability to be the same, but that probability is just the lenght of this compex number squared $|\gamma|^2 = |\gamma'|^2 = 1$

> <font color="red">**So each coefficient can be represented as an arrow with equal length $|\gamma|^2$ and $|\gamma'|^2$ (hence the probability of measuring that energy this state is still the same!!), BUT $|\gamma'|^2$ may be rotated by an angle $\phi$**. This angle is new vector = rotation * old vector:</font>

> <font color="red">$\gamma' = e^{i\phi}\gamma$</font>

Let's plug that rotation $e^{i\phi}$ in to our previous equation:

> <font color="red">$T |\Psi \rangle$ = $e^{i\phi_1}\alpha |\Psi \rangle + e^{i\phi_2}\beta |\Psi \rangle + e^{i\phi_3}\gamma |\Psi \rangle$</font>

* where the angles / rotations $e^{i\phi}$ are different for every energy = they are all rotated by a different amount !! Otherwise the rotation can be brought out and present and future state would be the essentially same:

> $T |\Psi \rangle$ = $e^{i\phi}\alpha |\Psi \rangle + e^{i\phi}\beta |\Psi \rangle + e^{i\phi}\gamma |\Psi \rangle$ = $e^{i\phi} (\alpha |\Psi \rangle + \beta |\Psi \rangle + \gamma |\Psi \rangle)$ (this is showing that it's wrong!)

The overall rotation wouldn't affect any measurement outcomes. Means no matter in which crazy situation you brought the particle in, it does nothing, which can't be right.

> $T |\Psi \rangle$ = $e^{i\phi}|\Psi \rangle$ (this is showing that it's wrong!


* also the amount of rotation depends on time (little going forwardf = little rotation). That suggests the right amount of angle to rotate is Energy x Time. Plus some constants to deal with units and scaling etc.

> <font color="red">$\phi = \frac{E * t}{\hbar}$</font>

* And that's what the Schroedinger equation will tell you will happen to the state:

> $T |\Psi \rangle$ = $e^{i \frac{E * t}{\hbar}}\alpha |\Psi \rangle + e^{i \frac{E * t}{\hbar}}\beta |\Psi \rangle + e^{i \frac{E * t}{\hbar}}\gamma |\Psi \rangle$

* And that's the same: (with $\hat{H}$ for energy measurement operator, Hamiltonian):

> <font color="red">$T(t) |\Psi \rangle = e^{i \frac{\hat{H} * t}{\hbar}}|\Psi \rangle$</font>

*Common result for 2 observations:*

Time Evolution per each step, observer 1:

> $| \Psi \rangle$ $\rightarrow$ at $t_1$ = $e^{\frac{i \mathcal{H} t_1}{\hbar}} | \Psi \rangle$ $\rightarrow$ at $t_2$ = <font color="blue">$e^{\frac{i \mathcal{H} t_2}{\hbar}} (e^{\frac{i \mathcal{H} t_1}{\hbar}} | \Psi \rangle)$</font>

Time Evolution at the end for observer 2 (not seeing time step 1):


> $| \Psi \rangle$ $\rightarrow$ at $t_2$ = <font color="orange">$e^{\frac{i \mathcal{H} (t_1 + t_2)}{\hbar}} | \Psi \rangle$</font>

Where:

> <font color="orange">$e^{\frac{i \mathcal{H} (t_1 + t_2)}{\hbar}} | \Psi \rangle$</font> = <font color="blue">$e^{\frac{i \mathcal{H} t_2}{\hbar}} (e^{\frac{i \mathcal{H} t_1}{\hbar}} | \Psi \rangle)$

Why? - because our angle of rotation depends on $t$ (and not $t^2$ or anything): $T\left(t_{1}+t_{2}\right)=T\left(t_{2}\right) T\left(t_{1}\right)$

Taken from [Schrodinger equation comment response and homework answers video](https://www.youtube.com/watch?v=M_2h5uQ0SIc)

**Harmonic Oscillator (Hamiltonians / Solution of Schrodinger equation)**

* The nuclear motion Schrödinger equation can be solved in a space-fixed (laboratory) frame, **but then the translational and rotational (external) energies are not accounted for**. Only the (internal) atomic vibrations enter the problem.

* Further, for molecules larger than triatomic ones, it is quite common to introduce the **harmonic approximation, which approximates the potential energy surface** as a [quadratic function](https://en.m.wikipedia.org/wiki/Quadratic_function) of the atomic displacements. **This gives the harmonic nuclear motion Hamiltonian**.

* Making the harmonic approximation, we can **convert the Hamiltonian into a sum of uncoupled one-dimensional [harmonic oscillator](https://en.m.wikipedia.org/wiki/Harmonic_oscillator) Hamiltonians**.

> **The one-dimensional harmonic oscillator is one of the few systems that allows an exact solution of the Schrödinger equation.**

https://de.m.wikipedia.org/wiki/Nullpunktsenergie#Harmonischer_Oszillator

https://de.m.wikipedia.org/wiki/Harmonischer_Oszillator_(Quantenmechanik)

###### ***Matrix Mechanics*** *(Heisenberg, discrete basis, spin representation, Kronecker delta function)*

**Matrix Mechanics (spin representation - discrete basis) - (Heisenberg)**

* Matrix Mechanics (spin representation - discrete basis)

* Video: [Matrix formulation of quantum mechanics](https://www.youtube.com/watch?v=wIwnb1ldYTI)

* **Matrix mechanics ('matrix formulation'): Most useful when we deal with finite, discrete bases (like spin representation).**

* **Matrix formulation of quantum mechanics reduces to the rules of simple matrix multiplication.**

> $\hat{A}=\sum_{i j} A_{i j}\left|u_{i}\right\rangle\left\langle u_{j}\right| \quad A_{i j}=\left\langle u_{i}|\hat{A}| u_{j}\right\rangle$

* An operator A can be written in the u basis as the sum over the outer products of the basis states
* And the expansion coefficients Aij are given by the matrix elements of A with respect to the basis states
* The expansion coefficients for an operator are labeled by 2 indices, so we will arrange them in a form of a square matrix, with the first index denoting the row of the matrix and the second index the column of the matrix
* There operators are written as matrices

> $\left(\begin{array}{ccccc}A_{11} & A_{12} & \cdots & A_{1 j} & \cdots \\ A_{21} & A_{22} & \cdots & A_{2 j} & \cdots \\ \vdots & \vdots & & \vdots & \\ A_{i 1} & A_{i 2} & \cdots & A_{i j} & \cdots \\ \vdots & \vdots & & \vdots & \end{array}\right)$

Kets are written as column vectors:

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_218.png)

Matrix formulation of Bra's: re-arrange complex, conjugate coefficient as a row vector:

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_219.png)

An operator A can be written in the u basis as the sum over the outer products of the basis states. An Aij are the expansion coefficient in this case. Operators are written as matrices:

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_220.png)

Summary of the matrix formulation of quantum mechanics for kets, bras and operators:

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_221.png)

We will see how simple matrix multiplication rules work for the following 4 operations:

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_222.png)

First, in the matrix formulation of quantum mechanics, a bracket is the matrix product of a row vector with a column vector, and gives a scalar:

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_223.png)

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_224.png)

Adjoint operator: describe the action of an operator in the dual space:

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_225.png)

Write an operator as an outer product of two states:

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_226.png)

Summary:

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_227.png)

###### ***Wave Mechanics*** *(Schrödinger, continuous basis, position representation, Dirac delta function)*

**First: let's go from discrete basis $u_i$ to continuous basis $v_{\alpha}$**

* Much used **discrete basis**: Spin of quantum particles

* Much used **continuous basis**: position of quantum particles (this one leads to the idea of wave function)

> **The generalization is straightforward: it amounts to replacing Kronecker delta functions $\delta_{i j}$ of two discrete variables with the Dirac delta function $\delta (\alpha - \beta)$ of two continuous variables and sum over these indices by integrals over continuous indices**.

* first concept: we work with an orthonormal basis $\left\langle u_{i} \mid u_{j}\right\rangle=\delta_{i j}$. replacing Kronecker delta functions $\delta_{i j}$ of two discrete variables with the Dirac delta function $\delta (\alpha - \beta)$

* then we look at the expansion of Ket in a particular basis: replacing a sum over i with an integral over alpha

* in yellow: just a proof why we would write a Dirac delta function only under an integral sign

* last part: representation of an operator in a particular basis (for continuous basis)

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_217.png)

**Wave Mechanics (Schrödinger)**

* Wave Mechanics: Wave Function (position representation - continuous basis)

* Check also part under Operator: Translation Operators (**Wave Mechanics: Translation Operator**)

* Video: [Wave functions in quantum mechanics](https://www.youtube.com/watch?v=2lr3aA4vaBs)

* https://en.m.wikipedia.org/wiki/Wave_function

* wave functions is one possible way looking at a quantum system

* is the so called position representation of quantum mechanics

* leads to wave mechanics (for continuous basis)

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_232.png)

Computing scalar between two states and the normalization:

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_233.png)

How to get from the position representation to the momentum representation (via a first order differential equation):

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_234.png)

**Transformation matrix $\langle x|p\rangle$ to go from the position representation to the momentum representation**:

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_235.png)

To change between both representations we use Fourier transform:

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_236.png)

If we go to 3 dimensions:

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_237.png)

Summary:

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_238.png)

**Wave Mechanics: Position and momentum operators acting on wave functions**

* Video: [Position and momentum operators acting on wave functions](https://www.youtube.com/watch?v=Yw2YrTLSq5U)

* Action of position and momentum operators on wave functions

* The action of the position operator: it multiplies a wave function by x:

> $x \psi(x)$

* the action of a momentum operator: it acts by calculating the derivative of a wave function

> $-i \hbar \frac{d \psi(x)}{d x}$

* First: describe the act of a position operator in the position basis, and the act of a momentum operator in the momentum basis:

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_239.png)

Second: describe the act of the momentum operator on a state Psi when written in the position representation (=basis) is such that the momentum operator calculates the derivative of the wavefunction and then multiplies the result by minus i h-bar (we need the translation operator):

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_240.png)

Third: what happens when we act with the position operator in the momentum basis (proof: the momentum space wavefunction is related to the real space wavefcunction by a Fourier transform):

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_241.png)

Generalize this to 3 dimensions:

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_242.png)

Summary: as you can see there are different representations for the same thing, but the maths is differently difficult. So the task is to find a representation that is easy for a certain problem:

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_243.png)

###### *Wavefunction & Global Phase Difference (Why we factor it out and leave phase difference only)*

**How does the Wavefunction (one type) look like?**

> $\psi=e^{\frac{1}{\hbar}(px - Et)}$

* p = momentum in direction x, x = position along x direction, E = energy, t = time

* e is the exponential function, normally it doesn't look like a wave, like $e^{-x}$ or $e^{x}$

* but the imaginary number $i=\sqrt{-1}$ turns an exponential function into a wave

* sinoisdal functions (sine and cosine) can be written in terms of the exponential function with $i$ in the exponent

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_244.png)

* we could take one complex wave function and break it down into simpler waves, then we apply same maths on the simpler waves:

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_245.png)

Global phase factor $e^{i\theta}$:

* Eigenvalues and Eigenvectors exist also in other vector spaces than state spaces, but because state space is a complex vector space there is one important extra subtlety compared to real vector spaces which has to do with the global phase factor $e^{i\theta}$

* after choosing alpha to make the length of an Eigenstate equal to 1, we still have some extra freedom in the Eigenstate

* multiplying $|\Psi\rangle$ with a Global phase factor $e^{i\theta}$ makes the length of the resulting $|\Psi'\rangle$ still = 1

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_258.png)

Let's see what happens when we square the wave function:
* the function is an oscillation in quantum possibilities moving through space and time
* but it's a complex wave with one real and one imaginary component
* **the components oscillate in sync with each other - but they are offset, shifted in phase by a constant amount**
> phase is just the wave's current state in its up-down oscillation

![geometry](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_0964.png)

When we apply the Born rule we are squaring these two waves and adding them together
* but it turns out that this value doesn't depend on phase. The magnitude squared of the real and imaginary components stays the same, even as those components move up and down
* It is that magnitude squared that we can observe, it determines the particles position
* the phase itself is fundamentally unobservable. You can shift phase by any amount and you wouldnt change the resulting position of the particle, as long as you do the same shift to both the real and the imaginary components.

![geometry](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_0965.png)


In fact as long as you make the same shift across the entire wave function, all the observables are unchanged.
* We call this form of transformation a global phase shift, and it's analogous to transforming our altitude zero point up or down by the same amount everywhere.
* the equations of quantum mechanics have what we call **global phase invariance**
* **Global phase is a Gauge symmetry of the system**

![geometry](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_0966.png)



Reminder: Trigonometrie:

> $\begin{aligned} \mathrm{e}^{\mathrm{i} x} &=\sum_{k=0}^{\infty} \frac{(\mathrm{i} x)^{k}}{k !}=\sum_{l=0}^{\infty} \frac{(\mathrm{i} x)^{2 l}}{(2 l) !}+\sum_{l=0}^{\infty} \frac{(\mathrm{i} x)^{2 l+1}}{(2 l+1) !} \\ &=\underbrace{\sum_{l=0}^{\infty}(-1)^{l} \frac{x^{2 l}}{(2 l) !}}_{\cos x}+\underbrace{\mathrm{i} \sum_{l=0}^{\infty}(-1)^{l} \frac{x^{2 l+1}}{(2 l+1) !}}_{\sin x} \\  \mathrm{e}^{\mathrm{i} x}&=\cos x+\mathrm{i} \sin x \\  \mathrm{e}^{\mathrm{i} x}&=\cos \varphi+\mathrm{i} \sin \varphi \\ \mathrm{e}^{\mathrm{i} x}&=x+\mathrm{i} y \end{aligned}$ Das ist die sogenannte [Eulerformel](https://de.m.wikipedia.org/wiki/Eulersche_Formel)!


![ggg](https://upload.wikimedia.org/wikipedia/commons/thumb/7/71/Sine_cosine_one_period.svg/600px-Sine_cosine_one_period.svg.png)

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_040.jpg)

![ggg](https://upload.wikimedia.org/wikipedia/commons/3/3b/Circle_cos_sin.gif)

* The only reason phase is important is because it brings about interference effects. And interference effects are only dependent on the difference in phase between the two waves (we will abstract basis states to waves for now).

> **Therefore, we can say that it’s the difference that counts, and not the absolute value.**

* For example, if the phase difference is π radians then the waves would cancel each other out.


![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/quantum_179.jpg)

We can conclude that the absolute value of the phase shift for both waves is meaningless to the interference. For as long as they both retain the phase difference, then the interference effect will be constant, and that’s what matters!


**Why the global phase doesn’t matter**

As observed above, what matters is the phase difference. So think about it, if you’re describing the phase of two waves, it’s redundant to state both phases. **The better approach is to just state the phase difference**.

* The global phase is the absolute value of the phase shift for both waves.

* For example, wave one and two each have a phase of (π/2) radians and (3π/2) radians, respectively. In this case, the phase (π/2) is a global phase since the phase difference (what actually counts) is π radians → (3π/2 - π/2 ).

* Using the same example, we can define the relative phase (also known as the local phase) as the phase difference (π rad).

We’ve found a method to **bypass the need for a fourth dimension by factoring out the global phase** and replacing the second phase with the relative phase. We can represent this mathematically.

A general qubit state can be written

> $|\psi\rangle=\alpha|0\rangle+\beta|1\rangle$

with complex numbers, α and β, and the normalization constraint require that:


> $|\alpha|^{2}+|\beta|^{2}=1$

As previously stated, we can express the amplitudes in polar coordinates as (General equation for a qubit state):

> $|\psi\rangle=r_{\alpha} e^{i \phi_{\alpha}}|0\rangle+r_{\beta} e^{i \phi_{\beta}}|1\rangle$

with four real parameters:

> $r_{\alpha}, \phi_{\alpha}, r_{\beta}$ and $\phi_{\beta}$

**However, the only measurable quantities are the probabilities |α|² and |β|², so multiplying the state by an arbitrary factor $e^{iγ}$ (global phase) has no observable consequences, because**:

> $\left|e^{i \gamma} \alpha\right|^{2}=\left(e^{i \gamma} \alpha\right)^{*}\left(e^{i \gamma} \alpha\right)=\left(e^{-i \gamma} \alpha^{*}\right)\left(e^{i \gamma} \alpha\right)=\alpha^{*} \alpha=|\alpha|^{2}$

Therefore, we can factor out $e^{iΦ}$ from the general equation:

> $|\psi\rangle=e^{i \phi_{\alpha}}\left(r_{\alpha}|0\rangle+r_{\beta} e^{i\left(\phi_{\beta}-\phi_{\alpha}\right)}|1\rangle\right)$

Now, if you calculate the amplitude |ψ|², the factor ($e^{iΦ_α}$) in front will vanish by the argument above. This is why we called it the global phase. However, the relative phase is the phase difference noted as (Φ_α - Φ_β). This is an observable-ish quantity which manifests through interference effects.

Let’s consolidate the above equation into:

> $|\psi\rangle=r_{\alpha}|0\rangle+r_{\beta} e^{i\left(\phi_{\beta}-\phi_{\alpha}\right)}|1\rangle=r_{\alpha}|0\rangle+r_{\beta} e^{i \phi}|1\rangle$

> $r_{\alpha} \in \mathbb{R}, r_{\beta} \in \mathbb{R}, \phi \in \mathbb{R} \mid \phi=\phi_{\beta}-\phi_{\alpha}$

where r_α, r_β and Φ all real parameters.

**Notice that this equation can be represented in 3-D, as the global phase is gone.**



https://pavanjayasinha.medium.com/but-what-is-a-quantum-phase-factor-d05c15c321fe

**Physical Meaning of Phase**

> Remember: $e^{2 \pi i}$ = 1 (Identity)

* In quantum mechanics, a phase factor is a complex coefficient $e^{i \theta}$ that multiplies a ket $|\psi\rangle$ or bra $\langle\phi|$.

* <font color="blue">**It does not, in itself, have any physical meaning**, since the introduction of a phase factor does not change the expectation values of a Hermitian operator.

> That is, the values of $\langle\phi|A| \phi\rangle$ and $\left\langle\phi\left|e^{-i \theta} A e^{i \theta}\right| \phi\right\rangle$ are the same.

* <font color="red">However, differences in phase factors between two interacting quantum states can sometimes be measurable (such as in the Berry phase) and this can have important consequences.

https://en.wikipedia.org/wiki/Phase_factor

* When people say that the phase doesn't matter, they mean the overall, "global" phase. In other words, the state $|0\rangle$ is equivalent to $e^{i \theta}|0\rangle$, the state $|1\rangle$ is equivalent to $e^{i \theta^{\prime}}|1\rangle$, and the state $|0\rangle+|1\rangle$ is equivalent to $e^{i \theta^{\prime \prime}}(|0\rangle+|1\rangle)$.

> Siehe auch Eulersche Formel: $e^{i \phi}$ https://mathepedia.de/Eulersche_Formel.html

> Note that "equivalence" is not preserved under addition, since $e^{i \theta}|0\rangle+e^{i \theta^{\prime}}|1\rangle$ is not equivalent to $|0\rangle+|1\rangle$, because there can be a relative phase $e^{i\left(\theta-\theta^{\prime}\right)}$.

* If we wanted to describe this very simple fact with unnecessarily big words, we could say something like "the complex projective Hilbert space of rays, the set of equivalence classes of nonzero vectors in the Hilbert space under multiplication by complex phase, cannot be endowed with the structure of a vector space".

* Because the equivalence doesn't play nicely with addition, **it's best to just ignore the global phase ambiguity whenever you're doing real calculations**. Finally, when you're done with the entire calculation, and arrive at a state, you are free to multiply that final result by an overall phase.

https://physics.stackexchange.com/questions/552796/the-importance-of-the-phase-in-quantum-mechanics

* Phase: Any one point or portion in a recurring series of changes, as in the changes of motion of one of the particles constituting a wave or vibration; one portion of a series of such changes, in distinction from a contrasted portion, as the portion on one side of a position of equilibrium, in contrast with that on the opposite side.


https://courses.lumenlearning.com/boundless-chemistry/chapter/orbital-shapes/

In principle, we need four real numbers to describe a qubit, two for $\alpha$ and two for $\beta$. The constraint $|\alpha|^{2}+|\beta|^{2}=1$ reduces to three numbers.

In quantum mechanics, two vectors that differ from a global phase factor are considered equivalent. A global phase factor is a complex number of unit modulus multiplying the state. By eliminating this factor, a qubit can be described by two real numbers $\theta$ and $\phi$ as follows:

>$
|\psi\rangle=\cos \frac{\theta}{2}|0\rangle+\mathrm{e}^{\mathrm{i} \phi} \sin \frac{\theta}{2}|1\rangle
$

where $0 \leq \theta \leq \pi$ and $0 \leq \phi<2 \pi .$ In the above notation, state $|\psi\rangle$ can be represented by a point on the surface of a sphere of unit radius, called Bloch sphere. Numbers $\theta$ and $\phi$ are spherical angles that locate the point that describes $|\psi\rangle$, as shown in Fig. A.1. The vector showed there is given by

> $\left[\begin{array}{c}\sin \theta \cos \phi \\ \sin \theta \sin \phi \\ \cos \theta\end{array}\right]$

When we disregard global phase factors, there is a one-to-one correspondence between the quantum states of a qubit and the points on the Bloch sphere. State $|0\rangle$ is in the north pole of the sphere, because it is obtained by taking $\theta=0 .$ State $|1\rangle$ is in the south pole. States

> $
|\pm\rangle=\frac{|0\rangle \pm|1\rangle}{\sqrt{2}}
$

are the intersection points of the $x$-axis and the sphere, and states $(|0\rangle \pm \mathrm{i}|1\rangle) / \sqrt{2}$ are the intersection points of the $y$-axis with the sphere.

The representation of classical bits in this context is given by the poles of the Bloch sphere and the representation of the probabilistic classical bit, that is, 0 with probability $p$ and 1 with probability $1-p$, is given by the point in $z$-axis with coordinate $2 p-1$. The interior of the Bloch sphere is used to describe the states of a qubit in the presence of decoherence.

#### <font color="blue">**Mathematics**

###### *Number Theory*

Every rational number is algebraic, and some irrational numbers are too. https://lnkd.in/eG_NgkCs


**Zahlenarten**

https://de.m.wikipedia.org/wiki/Liste_besonderer_Zahlen

https://de.m.wikipedia.org/wiki/Zahldarstellung

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/number_001.jpg)

[Large_numbers](https://en.m.wikipedia.org/wiki/Large_numbers), [Names of large numbers](https://en.m.wikipedia.org/wiki/Names_of_large_numbers), [Liste besonderer Zahlen](https://de.m.wikipedia.org/wiki/Liste_besonderer_Zahlen)

![geometry](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_1631.png)

* [Constructible numbers](https://en.m.wikipedia.org/wiki/Constructible_number)
* [Algebraic numbers](https://en.m.wikipedia.org/wiki/Algebraic_number)
* [Transcendental numbers](https://en.m.wikipedia.org/wiki/Transcendental_number)
* [Computable numbers](https://en.m.wikipedia.org/wiki/Computable_number)
* **Non-Computable numbers**
  * [Chaitin constant](https://en.m.wikipedia.org/wiki/Chaitin%27s_constant)
  * [Die meisten reellen Zahlen kennen wir nicht](https://www.spektrum.de/kolumne/die-meisten-reellen-zahlen-sind-nicht-berechenbar/2133762) (gehorchen nicht einmal einer Rechenvorschrift)
* Special cross-section: [Normal numbers](https://en.m.wikipedia.org/wiki/Normal_number)
  * [Champernowne’s constant](https://en.m.wikipedia.org/wiki/Champernowne_constant) (whole numbers) - normal and transendental
  * [Copeland-Erdös-number](https://de.m.wikipedia.org/wiki/Copeland-Erdős-Zahl) (primes)


Source: [All the Numbers - Numberphile](https://www.youtube.com/watch?v=5TkIe60y2GI&t=458s)

**Empty section: normal and Non-Computable numbers (we have no examples)**: We have no examples. But proofs have shown: this is the greatest amount of numbers: most numbers are normal and most numbers are uncomputable. So this section should be full, but we have no example.

https://www.quantamagazine.org/how-a-mathematical-paradox-allows-infinite-cloning-20210826/

The Banach–Tarski paradox is a theorem in set-theoretic geometry, which states the following: Given a solid ball in three-dimensional space, there exists a decomposition of the ball into a finite number of disjoint subsets, which can then be put back together in a different way to yield two identical copies of the original ball.



**Exkurs: Transfinite numbers**

In mathematics, transfinite numbers are numbers that are greater than all finite numbers. They are often used in set theory and topology.

There are two main types of transfinite numbers: ordinal numbers and cardinal numbers. Ordinal numbers are used to order sets, while cardinal numbers are used to count the number of elements in a set.

Some examples of transfinite numbers are:

* **$\omega$** (omega): The smallest transfinite ordinal number. It is the order type of the natural numbers.
* **$\omega^2$** (omega squared): The number of all possible orderings of the natural numbers.
* **$\aleph_0$** (aleph null): The smallest transfinite cardinal number. It is the cardinality of the set of natural numbers.
* **$\aleph_1$** (aleph one): The cardinality of the set of all countable ordinal numbers.
* **$\aleph_2$** (aleph two): The cardinality of the set of all uncountable ordinal numbers.

Transfinite numbers can be used to solve problems that are impossible to solve with finite numbers. For example, the problem of determining whether a given set is countable or uncountable can only be solved using transfinite numbers.

The study of transfinite numbers is a complex and fascinating area of mathematics. It has applications in many different areas of mathematics, including set theory, topology, and logic.

**what is the relationship between transfinite numbers and infinity?**

Transfinite numbers are numbers that are greater than all finite numbers. Infinity is a concept that refers to something that is boundless or endless.

The relationship between transfinite numbers and infinity is that transfinite numbers are a way of quantifying infinity. In other words, transfinite numbers allow us to talk about different sizes of infinity.

For example, the natural numbers (1, 2, 3, ...) are finite numbers. The set of all natural numbers is countable, which means that it can be put into a one-to-one correspondence with the set of all positive integers.

The set of all real numbers is uncountable, which means that it cannot be put into a one-to-one correspondence with the set of all positive integers. This means that the set of all real numbers is larger than the set of all natural numbers.

Transfinite numbers can be used to quantify the size of the set of all real numbers. The smallest transfinite cardinal number is $\aleph_0$, which is the cardinality of the set of all natural numbers. The cardinality of the set of all real numbers is $\aleph_1$, which is larger than $\aleph_0$.

There are many different transfinite numbers, and they can be used to quantify the size of different sets. The study of transfinite numbers is a complex and fascinating area of mathematics.

Here are some other ways to think about the relationship between transfinite numbers and infinity:

* Transfinite numbers are a way of making infinity more precise.
* Transfinite numbers allow us to compare different infinities.
* Transfinite numbers can be used to solve problems that are impossible to solve with finite numbers.

The study of transfinite numbers is a complex and fascinating area of mathematics. It has applications in many different areas of mathematics, including set theory, topology, and logic.

http://peter-ripota.de/mathe/ordinalzahlen-einfach-weiterzaehlen/von-%CF%89-zu-%CE%B50-von-klein-omega-zu-epsilon-null/

**Zahlentheorie**

* https://de.m.wikipedia.org/wiki/Zahlentheorie

* https://de.m.wikipedia.org/wiki/Algebraische_Zahlentheorie

* https://de.m.wikipedia.org/wiki/Diophantische_Gleichung: Polynomfunktion mit ganzzahligen Koeffizienten ist und nur ganzzahlige Lösungen gesucht werden

https://en.m.wikipedia.org/wiki/Analytic_number_theory

https://mathworld.wolfram.com/topics/NumberTheory.html

> [Number theory Full Course [A to Z]](https://www.youtube.com/watch?v=19SW3P_PRHQ)

> [The Continuity of Splines](https://www.youtube.com/watch?v=jvPPXbo87ds)

Infinity is not a number, it's an idea, a concept. Hence you cannot equal something to infinity.

Aleph Null

https://de.m.wikipedia.org/wiki/Aleph-Funktion

https://www.spektrum.de/lexikon/mathematik/amp-aleph-0-aleph-null/443

[Counting beyond infinity](https://youtu.be/SrU9YDoXE88)

**Infinity $\infty$**

* Kontinuumshypothese (Cantor) kann man nicht beweisen oder widerlegen (Gödel). Die Mathematik kann funktionieren, wenn man sowohl davon ausgeht, dass die Kontiuumshypothese gilt, als auch, dass sie nicht gilt.

* term of size (how many?) is: **Cardinality**

* Rules for comparing cardinalities: injection, subjection, bijection

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_319.png)

* How does the cardinality of a set A compares to that of its power set (=set of all subsets of A including itself)

* The cardinality of the power set of A is strictly greater than the cardinality of its original set A, even when it's infinite

* [How Big are All Infinities Combined? (Cantor's Paradox) | Infinite Series](https://www.youtube.com/watch?v=TbeA1rhV0D0&list=WL&index=12&t=86s)

**Smallest Sizes of Infinity**

* intuition is often misleading in mathematics!

* There are infinitely many sizes of infinity

* Smallest infinity: natural numbers (counting numbers). Here: even numbers and natural numbers are the same size!

  * Natural numbers: $\aleph$ "Aleph-naught" (the least infinity cardinality)

* We need to find a way to tell which infinity is bigger without counting them.

* You use something called Bijection: if you can pair up two sets, they are the same size

	* even numbers and natural numbers are the same size, because there is a bijection between the two sets! Each natural number is [aired with 2 times itself: 1 with 2, 2 with 4, 4 with 6 etc. Damit sind alles odd numbers out, but the set didn't get smaller

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_318.png)

	* the natural numbers are also the same size as the integers: Integers include all natural numbers + all the negative whole numbers. We can pair them up exactly, so they must be the same size.

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_317.png)

* Above the natural numbers are the real numbers in terms of infinity size. After that the real numbers.

	* an interval on the real number line is also an infinity. And an interval on the real line between 0 and 5 has the same size as an interval between 0 and 10. They are ll as big as the real numbers!!

	* You can show that any interval is the same size as the entire real number line !

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_316.png)

* Cantor wondered if there is an infinity between the natural and the real numbers?

	* CONTINUUM HYPOTHESIS: there is no size of infinity between the natural numbers and the real numbers

	* Decades later, mathematicians found out that the Continuum hypothesis is independent of the Zermela-Fraenkel set theory with choice (ZFC) - the Continuum hypothesis can not be proved or disproved using the standard rules of mathematics

* So in one model of the tower of infinities the real numbers sit directly above the natural numbers

* But in other models there are many infinities in between

* The rules of maths don't say that one tower is correct and the other wrong.

> So it seems that mathematics seems to be surprisingly agnostics with regards to which hierarchy of infinities is correct

* How does the cardinality of a set A compares to that of its power set (=set of all subsets of A including itself)

* The cardinality of the power set of A is strictly greater than the cardinality of its original set A, even when it's infinite

[Problems with Zero - Numberphile](https://www.youtube.com/watch?v=BRRolKTlF6Q)

* something divided by zero: from negative it approaches negative infinity, from positive it approaches positive infinity, hence error

* zero to the power of 0: in the complex plane you get different answers, hence error

https://de.m.wikipedia.org/wiki/Null

**Modular Forms (Algebraic Number Theory)**

[Why is there no quintic formula](https://youtu.be/1EWUsef0iFs)

[Counting points on the E8 lattice with modular forms (theta function)](https://youtu.be/AxPwhJTHxSg)

[The bridge between number theory and complex analysis](https://youtu.be/zLEyIT_BCgk)

[Ellyptic curves and modular forms (the proof of Fermats last theorem](https://youtu.be/grzFM5XciAY)

[The math behind Fermats last theorem (modular forms)](https://youtu.be/z7A_bSl8kIw)

[Algebraic number theory - an illustraded guide (is 5 a prime?)](https://youtu.be/4m_EaWA08H0)

[Why is Pi here? (half factorial without gamma function)](https://youtu.be/y-yb_EmVtg0)

https://en.m.wikipedia.org/wiki/Eisenstein_series

https://en.m.wikipedia.org/wiki/Modular_form

https://en.m.wikipedia.org/wiki/Modular_group

Video 1: https://youtu.be/z7A_bSl8kIw

Video 2: https://youtu.be/grzFM5XciAY

Video 3: https://youtu.be/zLEyIT_BCgk

**Can you explain what the modularity condition is in modular forms?**

Sure. The modularity condition is a condition that a function must satisfy in order to be considered a modular form. It is a technical condition, but it can be roughly summarized as follows:

* The function must be holomorphic (complex analytic) on the upper half-plane.
* The function must be invariant under the action of the modular group SL(2,Z).
* The function must have a Fourier series expansion that only has non-negative powers of q.

The modularity condition is important because it ensures that modular forms have certain properties that make them useful in mathematics. For example, modular forms are known to be related to elliptic curves, which are important objects in number theory.

The modularity condition is also interesting from a mathematical perspective. It is a difficult condition to satisfy, and there are only a finite number of modular forms of a given weight and level. This makes modular forms a rich and well-studied class of functions.

Here is a more detailed explanation of the modularity condition:

* **Holomorphic on the upper half-plane:** This means that the function must be smooth and have no singularities in the upper half-plane, which is the set of complex numbers z such that Im(z) > 0.
* **Invariant under the action of SL(2,Z):** This means that the function must be the same under the action of any element of the modular group SL(2,Z). The modular group is the group of 2x2 matrices with integer entries and determinant 1.
* **Fourier series expansion with only non-negative powers of q:** This means that the function can be written as a Fourier series with only non-negative powers of q, where q is a complex number such that |q| < 1.

The modularity condition is a powerful tool in mathematics. It has been used to prove many important results in number theory, and it continues to be an active area of research.

**Are modular forms used in quantum computing?**

Yes, modular forms are used in quantum computing. They have been used in a variety of applications, including:

* **Quantum error correction:** Modular forms can be used to construct quantum error-correcting codes. These codes are used to protect quantum information from errors that can occur during quantum computation.
* **Quantum cryptography:** Modular forms can be used to construct quantum cryptographic protocols. These protocols are used to securely transmit information over a quantum channel.
* **Quantum algorithms:** Modular forms have been used to develop quantum algorithms for problems such as integer factorization and the discrete logarithm problem.

The use of modular forms in quantum computing is still a relatively new area of research. However, there is a growing body of work that suggests that modular forms have the potential to be a powerful tool for quantum computation.

Here are some specific examples of how modular forms have been used in quantum computing:

* In 2005, Vadim Lyubashevsky showed that modular forms could be used to construct quantum error-correcting codes that are more efficient than previous codes.
* In 2006, Peter Shor showed that modular forms could be used to construct quantum algorithms for integer factorization that are significantly faster than classical algorithms.
* In 2010, Craig Gidney showed that modular forms could be used to construct quantum algorithms for the discrete logarithm problem that are significantly faster than classical algorithms.

These are just a few examples of how modular forms have been used in quantum computing. As research in this area continues, it is likely that we will see even more applications of modular forms in quantum computing in the future.

**What is the discrete logarithm problem (DLP) and integer factorization?**

The discrete logarithm problem (DLP) is a computational problem in mathematics and computer science. It is the problem of finding the exponent k in the equation g^k = a (mod n), where g and n are known integers, and a is an unknown integer.

The DLP is a difficult problem to solve in general. There is no known polynomial-time algorithm for solving the DLP, and it is believed that such an algorithm does not exist. This makes the DLP a useful problem in cryptography, as it can be used to construct secure cryptographic systems.

For example, the Diffie-Hellman key exchange protocol uses the DLP to securely exchange keys over an insecure channel. The Elliptic Curve Digital Signature Algorithm (ECDSA) also uses the DLP to sign digital documents.

The DLP is a very important problem in cryptography, and it is likely to remain so for many years to come.

Here are some examples of how the DLP is used in cryptography:

* **Diffie-Hellman key exchange:** The Diffie-Hellman key exchange protocol is a method for two parties to securely exchange a secret key over an insecure channel. The protocol works by using the DLP to compute the shared secret key.
* **Elliptic Curve Digital Signature Algorithm (ECDSA):** The ECDSA is a digital signature algorithm that uses the DLP to sign digital documents. The ECDSA is a popular choice for digital signatures because it is relatively efficient and secure.

The DLP is a difficult problem to solve, but it is not impossible. There are some algorithms that can be used to solve the DLP, but they are not very efficient. If a polynomial-time algorithm for solving the DLP were to be found, it would have a major impact on cryptography.

I hope this helps! Let me know if you have other questions.

Sure, I can explain the difference between the work of Peter Shor and Craig Gidney.

Peter Shor's algorithm is a quantum algorithm for integer factorization. It was developed in 1994, and it is one of the most important results in quantum computing. Shor's algorithm shows that it is possible to factor large integers in polynomial time using a quantum computer. This has major implications for cryptography, as many of the most widely used cryptographic systems are based on the difficulty of factoring large integers.

Craig Gidney's work on the discrete logarithm problem is complementary to Shor's algorithm. The discrete logarithm problem is closely related to integer factorization, and Gidney's work shows that it is also possible to solve the discrete logarithm problem in polynomial time using a quantum computer. This has similar implications for cryptography, as many of the most widely used cryptographic systems are also based on the difficulty of solving the discrete logarithm problem.

The main difference between Shor's algorithm and Gidney's work is that Shor's algorithm is specifically designed for integer factorization, while Gidney's work is more general and can be used to solve the discrete logarithm problem as well. Additionally, Shor's algorithm is a probabilistic algorithm, while Gidney's work is a deterministic algorithm. This means that Shor's algorithm may not always be able to factor an integer, while Gidney's work is guaranteed to be able to solve the discrete logarithm problem if the correct input is given.

In summary, the work of Peter Shor and Craig Gidney are both important results in quantum computing. Shor's algorithm shows that it is possible to factor large integers in polynomial time using a quantum computer, while Gidney's work shows that it is also possible to solve the discrete logarithm problem in polynomial time using a quantum computer. These results have major implications for cryptography, as many of the most widely used cryptographic systems are based on the difficulty of factoring large integers or solving the discrete logarithm problem.

I hope this helps! Let me know if you have other questions.

**Belphegors Primzahl**

1000000000000066600000000000001
It's 666 with 13 zero's on either side & 1's on both ends.

https://de.wikipedia.org/wiki/Belphegors_Primzahl

**Dedekind-Zahl**

https://www.scinexx.de/news/technik/mathematik-neunte-dedekind-zahl-geknackt/

https://de.m.wikipedia.org/wiki/Dedekind-Zahl

**Carmichael-Zahl**

Eine [Carmichael-Zahl](https://de.m.wikipedia.org/wiki/Carmichael-Zahl) ist eine natürliche Zahl mit besonderer Primfaktorzerlegung

Eine Carmichael-Zahl ist stets ungerade und enthält mindestens 3 verschiedene Primfaktoren. Die kleinsten Carmichael-Zahlen sind 561, 1105, 1729.

https://www.spektrum.de/lexikon/mathematik/carmichael-zahl/1414

https://www.faz.net/aktuell/wissen/daniel-larsen-findet-einen-mathebeweis-zu-carmichael-zahlen-18583248.html

**Narcissistic Number**

number theory, a narcissistic number is a number that can be expressed as the sum of its own digits raised to the power of the number of digits.

> $153 = 1^3 + 5^3 + 3^3$

https://en.m.wikipedia.org/wiki/Narcissistic_number

**Complex Numbers**

* https://physics.stackexchange.com/questions/155762/complex-numbers-in-quantum-mechanics-and-in-special-relativity

* you can use split-complex numbers in relativity, but ironically complex numbers have proved more popular for this).

https://en.m.wikipedia.org/wiki/Split-complex_number

**Bicomplex number**

* --> "Tessarine" redirects here. For real tessarines, see Split-complex number.

* In abstract algebra, a bicomplex number is a pair (w, z) of complex numbers constructed by the Cayley–Dickson process that defines the bicomplex conjugat ${\displaystyle (w,z)^{*}=(w,-z)}$, and the product of two bicomplex numbers as

> {\displaystyle (u,v)(w,z)=(uw-vz,uz+vw).}

* Then the bicomplex norm is given by

> ${\displaystyle (w,z)^{*}(w,z)=(w,-z)(w,z)=(w^{2}+z^{2},0),}$

* a quadratic form in the first component.

* https://hsm.stackexchange.com/questions/12866/why-are-quaternions-more-popular-than-tessarines-despite-being-non-commutative

![geometry](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_1200.png)

Images source:https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.527.356&rep=rep1&type=pdf


https://en.m.wikipedia.org/wiki/Bicomplex_number

Quaternions are useful. In particular for representation of rotations of 3-space. –

![ggg](https://upload.wikimedia.org/wikipedia/commons/thumb/8/81/Hyperkomplexe_Zahlen.svg/519px-Hyperkomplexe_Zahlen.svg.png)

**Dual Numbers**

* https://en.m.wikipedia.org/wiki/Dual_number

* Video: https://youtu.be/ceaNqdHdqtg

**Hyperreelle Zahlen & Nichtstandardanalysis (Infinitesimalrechnung)**

* There are also applications of nonstandard analysis to the theory of stochastic processes, particularly constructions of Brownian motion as random walks.

* https://de.m.wikipedia.org/wiki/Nichtstandardanalysis

* https://de.m.wikipedia.org/wiki/Hyperreelle_Zahl

* https://de.m.wikipedia.org/wiki/Infinitesimalrechnung

* https://en.wikipedia.org/wiki/Surreal_number

* https://en.wikipedia.org/wiki/Hyperreal_number

* https://en.wikipedia.org/wiki/Infinitesimal

**Hyperkomplexe Zahlen**

* [Hyperkomplexe Zahlen](https://de.m.wikipedia.org/wiki/Hyperkomplexe_Zahl) sind Verallgemeinerungen der komplexen Zahlen.

https://www.quantamagazine.org/the-octonion-math-that-could-underpin-physics-20180720/

**Transcendental numbers:**

* Zahlen, die Lösung einer algebraischen Gleichung: (Wurzel aus 2) - 2 = 0

* Transzendente Zahlen (in der Menge der reellen Zahlen): Zahlen, die nicht Lösung einer algebraischen Gleichung sind: e oder pi

> from: https://www.youtube.com/watch?v=P24tmohytXs

* pie π or Euler number

* Never end after comma: 3.14159265358979323846....

* Cannot be displayed as fraction

* [Transzedente Zahl](https://de.m.wikipedia.org/wiki/Transzendente_Zahl) heisst eine reelle Zahl (oder allgemeiner eine komplexe Zahl), wenn sie nicht Nullstelle eines (vom Nullpolynom verschiedenen) Polynoms mit ganzzahligen Koeffizienten ist. Andernfalls handelt es sich um eine algebraische Zahl. Jede reelle transzendente Zahl ist überdies irrational.

* omnem rationem transcendunt, lat.: Sie sind jenseits aller Vernunft



**Perplex Numbers**

* --> real tessarines

* also: [split complex number](https://en.m.wikipedia.org/wiki/Split-complex_number), hyperbolic number, double number)

* In algebra, a split complex number (or hyperbolic number, also perplex number, double number) has two real number components x and y, and is written $z = x + y j$, where $j^2$ = 1. The conjugate of z is $z^{∗}$ = x − y j. Since j2 = 1, the product of a number z with its conjugate is $zz^{∗}$ = $x^2 − y^2$, an [isotropic quadratic form](https://en.m.wikipedia.org/wiki/Isotropic_quadratic_form), =N(z) = $x^2 − y^2$.

* Very perplexing, right? They are defined as of form a+hb, where a and b are real numbers, h²=1.

  * But wait, that sounds too easy! Until you realize that h isn't +1 or -1. It is "something else".

* Perplex numbers have many applications outside of algebra and geometry, such as in Quantum Mechanics and the Theory of Relativity.

* Special Relativity:https://aapt.scitation.org/doi/10.1119/1.14605

* Quantum Mechanics: https://www.intlpress.com/site/pub/files/_fulltext/journals/cis/2014/0014/0003/CIS-2014-0014-0003-a001.pdf

**p-adic numbers**

* Size is different in p-adic numbers

* they have nothing to do with the real number line: here two numbers are close, when their first several digits are the same

* in p-adic absolute value, two numbers are close when their last digits are the same. so two numbers are close in this field, when they agree on

https://youtu.be/3gyHKCDq1YA

* infinite before comma: ....985356295141.3

* [p-adische Zahl](https://de.m.wikipedia.org/wiki/P-adische_Zahl) ist eine Zahl, die sich in einer Potenzreihe zu einer Primzahl darstellen lässt

* p-adic number systems emerge from modular arithmetic

* https://www.quantamagazine.org/how-the-towering-p-adic-numbers-work-20201019/

* https://www.quantamagazine.org/peter-scholze-and-the-future-of-arithmetic-geometry-20160628/

* https://www.quantamagazine.org/with-a-new-shape-mathematicians-link-geometry-and-numbers-20210719/

* "Das Dualsystem ist das Stellenwertsystem mit der Basis 2, liefert also die dyadische (2-adische) Darstellung von Zahlen (Dyadik) (gr. δύο = zwei)." [Source](https://de.m.wikipedia.org/wiki/Dualsystem#Grundrechenarten_im_Dualsystem)

**Primzahlen**


* https://de.m.wikipedia.org/wiki/Skewes-Zahl

* https://de.m.wikipedia.org/wiki/Primzahl

* https://de.m.wikipedia.org/wiki/Primzahlsatz

*Riemannsche Vermutung (Primzahlverteilung & Zeta-Funktion)*

* Die Verteilung der Primzahlen ist sehr merkwürdig und damit interessant. So zeigt die Verteilung von Primzahlen in (relativ) kurzen Intervallen eine gewisse „Zufälligkeit“, während andererseits beliebig lange Intervalle existieren, die keine Primzahl enthalten.

* Bernhard Riemann setzte sich in seiner Arbeit „Ueber die Anzahl der Primzahlen unter einer gegebenen Grösse“ (1859) zum Ziel, die Verteilung der Primzahlen mit analytischen Methoden zu bestimmen, stieß dabei auf Riemannsche ζ-Funktion und formulierte die Riemannsche Vermutung. Basierend auf den Riemannschen Ideen gelang 1896 der Beweis des Primzahlsatzes, mit dem man für große Zahlen x mit immer größerer relativer Genauigkeit sagen kann, wieviele Primzahlen ≤ x es gibt.

* Will man diese Anzahlen noch genauer wissen, so kommt man schnell in einen Bereich mathematischer Fragestellungen mit zahlreichen offenen Problemen, z. B. den Goldbach-Problemen oder Fragen über Primzahlzwillinge.

https://www.spektrum.de/lexikon/mathematik/primzahlverteilung/8085

https://de.m.wikipedia.org/wiki/Riemannsche_Vermutung

https://de.m.wikipedia.org/wiki/Riemannsche_Zetafunktion

A team of computer scientists have published an innovative polynomial time algorithm for constructing with high probability a prime number of a given size. Their algorithm could be useful in public-key cryptography.

How to Build a Big Prime Number, by Stephen Ornes, Quanta magazine, July 13, 2023, https://www.quantamagazine.org/how-to-build-a-big-prime-number-20230713/

**Modulform**

* used in stringtheorie

* https://de.m.wikipedia.org/wiki/Modulform

* https://de.m.wikipedia.org/wiki/Ramanujan-Thetafunktion

* https://de.m.wikipedia.org/wiki/Thetafunktion

* https://www.quantamagazine.org/moonshine-link-discovered-for-pariah-symmetries-20170922/

* https://www.quantamagazine.org/universal-math-solutions-in-dimensions-8-and-24-20190513/

* https://de.m.wikipedia.org/wiki/Kaluza-Klein-Theorie

###### *Linear Algebra*

**Spectral Gap (and and Combinatorial (Graph) Laplacians)**

* Spectral gap = **measure of connectivity of a quantum system** and **difference between second smallest and smallest eigenvalues** of normalized Laplacian matrix of graph (smallest eigenvalue is always zero for connected graphs)

* Eigenvalues: $\lambda_0 \leq \lambda_1 \leq \lambda_2 \leq \dots \leq \lambda_{n-1}$, $n$ is number of nodes (vertices) in graph. Spectral gap: $ \lambda_1$ = smallest non-zero eigenvalue.

* Graph with large spectral gap $\lambda_1 >> 0$ is well-connected. Graph with a small spectral gap is poorly connected.
  * It is related to the graph's **Cheeger constant**: The Cheeger constant is a measure of the graph's connectivity that is closely related to the spectral gap.
  * It is related to the **graph's mixing time**: The mixing time is the time it takes for a random walk on the graph to converge to its **stationary distribution**. A graph with a **large spectral gap has a short mixing time**, while a graph with a small spectral gap has a long mixing time (relevant in quantum annealing). Similar: **Markov Chains**: In the study of Markov chains, the spectral gap is related to the rate of convergence to the stationary distribution. A larger spectral gap typically means faster convergence.
  * Cluster analysis: The spectral gap can be used to find clusters in graphs.
  * Graph partitioning: The spectral gap can be used to partition graphs into two pieces with similar properties.
  * Random walks: The spectral gap can be used to analyze the behavior of random walks on graphs.
  * Diffusion processes: The spectral gap can be used to analyze the behavior of diffusion processes on graphs.
* In TDA the smallest non-zero eigenvalue of combinatorial Laplacian is related to number of connected components in the data. The second smallest non-zero eigenvalue is related to the number of holes in the data.
  * In gene expression data analysis, the combinatorial Laplacian can be used to identify clusters of genes that are co-expressed. This can be helpful for finding genes that are involved in the same biological process.
  * In protein interaction data analysis, the combinatorial Laplacian can be used to identify protein complexes. This can be helpful for understanding how proteins interact with each other to carry out biological functions.
  * In image data analysis, the combinatorial Laplacian can be used to identify objects in images. This can be helpful for image segmentation and classification.

**Exponential Spectral Gap**

* Exponential spectral gap: spectral gap decreases exponentially with respect to some parameter of system.
As system size grows, **difference between ground state energy (lowest eigenvalue) and first excited state (next smallest eigenvalue) might decrease exponentially**. This can have important implications for the system's behavior, especially in the context of quantum phase transitions or the computational hardness of approximating certain states.

* **The exact interpretation and importance of an exponential spectral gap can vary depending on the context**. In some cases, having such a gap is favorable for certain properties or behaviors, while in other contexts, it might pose challenges or signal specific types of behavior.

* In spectral graph theory, an exponential spectral gap is a property of a graph that ensures that the mixing time of a random walk on the graph is exponentially small in the number of vertices. This means that the random walk will converge to its stationary distribution very quickly. Exponential spectral gaps are important for a number of applications, including:
  * A graph has an exponential spectral gap if the second smallest eigenvalue of its normalized Laplacian matrix is exponentially small in the number of vertices. This means that the graph is very well-connected, and that the random walk will quickly spread throughout the graph.
  * Distributed algorithms: Exponential spectral gaps can be used to design distributed algorithms that are efficient and scalable.
  * Communication networks: Exponential spectral gaps can be used to design communication networks that are reliable and efficient.
  * Machine learning: Exponential spectral gaps can be used to design machine learning algorithms that are robust to noise and outliers.
  * Here are some examples of graphs with exponential spectral gaps:
    * Expander graphs: Expander graphs are graphs that have good connectivity properties. They are often used in the design of distributed algorithms and communication networks.
    * Ramanujan graphs: Ramanujan graphs are a special type of expander graph that has an exponential spectral gap. They are often used in the design of cryptography protocols.
    * Random graphs: Random graphs with a large number of vertices have an exponential spectral gap with high probability. This means that they can be used to design efficient and scalable algorithms for a variety of problems.

**Kernel and Spectral Gap**

* there is a connection between kernel and spectral gap. The spectral gap is a measure of how well-separated the eigenvalues of a matrix are. A large spectral gap means that the eigenvalues are well-separated, which means that the matrix is more stable and easier to invert.

* The kernel of a matrix is the set of all vectors that are orthogonal to all the eigenvectors of the matrix with eigenvalue 0. In other words, it is the set of all vectors that do not change under the action of the matrix.

* The spectral gap of a matrix is related to the kernel of the matrix in the following way: the larger the spectral gap, the smaller the dimension of the kernel. This is because if the spectral gap is large, then the eigenvalues of the matrix are well-separated, which means that there are fewer vectors that are orthogonal to all the eigenvectors with eigenvalue 0.

* This connection between kernel and spectral gap has implications for machine learning. Kernel methods are a type of machine learning algorithm that use kernels to compute the similarity between two data points. The spectral gap of the kernel matrix is related to the generalization performance of kernel methods. In general, kernel methods with a larger spectral gap have better generalization performance.

* In addition, the kernel of a matrix can be used to define a graph kernel. Graph kernels are a type of similarity measure between graphs. The graph kernel is based on the idea that two graphs are similar if their kernels are similar. The spectral gap of the kernel matrix is related to the quality of the graph kernel. In general, graph kernels with a larger spectral gap have better quality.

* Overall, the kernel and spectral gap are two important concepts in machine learning. They are related to each other in a number of ways, and they both have implications for the performance of machine learning algorithms.

**More on Spectral theory**

https://en.m.wikipedia.org/wiki/Laplacian_matrix

https://en.m.wikipedia.org/wiki/Discrete_Laplace_operator

https://en.m.wikipedia.org/wiki/Spectral_graph_theory

https://en.m.wikipedia.org/wiki/Spectral_shape_analysis

https://en.m.wikipedia.org/wiki/Spectral_clustering

https://en.m.wikipedia.org/wiki/Laplace_operator

> [Linear algebra for Quantum Mechanics](https://www.youtube.com/watch?v=FF05fXg03A0)

**What is a low rank matrix?**

In linear algebra, the rank of a matrix is the maximum number of linearly independent rows or columns in the matrix. This concept is important because it provides insight into the structure of the matrix, its null space, column space, etc.

A "low-rank" matrix is a matrix where the rank is significantly less than the number of rows or columns. For example, if you have a 1000x1000 matrix (which could potentially have a rank up to 1000), but its rank is only 10, that matrix would be considered low-rank.

Low-rank matrices are useful in many applications, including machine learning and data analysis. They are often used in techniques like principal component analysis (PCA) or singular value decomposition (SVD), where a high-dimensional dataset is approximated by a lower-dimensional one. This is possible because the data can often be accurately represented in a lower-dimensional space (i.e., with a low-rank matrix), which simplifies analysis and reduces computational complexity.

In the context of quantum computing, low-rank matrices are often easier to work with because they require fewer quantum resources to implement in quantum algorithms. For example, certain operations might require a number of qubits proportional to the rank of a matrix, meaning low-rank matrices can be more efficiently processed on a quantum computer.

**What is the difference between pca and svd?**

Principal Component Analysis (PCA) and Singular Value Decomposition (SVD) are both techniques used in linear algebra and data analysis, but they are used for somewhat different purposes and their computations are based on different principles. Here's a comparison:

1. **PCA (Principal Component Analysis)** is a statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components. The number of principal components is less than or equal to the number of original variables. The transformation is done in such a way that the first principal component has the largest possible variance (i.e., accounts for as much of the variability in the data as possible), and each succeeding component, in turn, has the highest variance possible under the constraint that it is orthogonal to (i.e., uncorrelated with) the preceding components. PCA is often used in exploratory data analysis and for making predictive models.

2. **SVD (Singular Value Decomposition)** is a method of decomposing a matrix into three other matrices. Given a matrix A, you can decompose it into three matrices U, Σ, and V^T such that A = UΣV^T. The U and V matrices contain orthogonal vectors (the left and right singular vectors), and Σ contains the singular values, which can be thought of as the "strength" or "importance" of the corresponding singular vectors. The SVD is a fundamental operation in linear algebra, and it has a wide range of applications. It's used in solving systems of linear equations, inverting matrices, and is the backbone of many data reduction techniques.

The relationship between PCA and SVD is as follows: PCA can be solved by applying the SVD on the data matrix (more precisely, on the covariance or correlation matrix of the data). The principal components resulting from PCA are equivalent to the right singular vectors resulting from the SVD (after some scaling). In other words, PCA can be considered a special case of SVD.

Despite their relationship, the methods are typically used in different contexts: **PCA is mainly used in the context of data analysis, for tasks like dimensionality reduction and extracting useful features from data, while SVD is a more general tool in linear algebra with a wider range of applications**.

*Eigenwerte*

**Kriterien fur die Existenz von Eigenwerten**

(*Wenn eine dieser Aussagen wahr ist, dann alle. Und wenn eine falsch, dann sind alle falsch*)

1. $\operatorname{rg}(B) < n$

2. $\operatorname{det}(B) = 0$ -> dieses Kriterium zu prufen ist am einfachsten und daher am haufigsten!

3. $B^{-1}$ existiert nicht (nicht invertierbar)

4. $B \vec{X}$ = 0 hat mehr als nur die Losung $\vec{x}$ = 0

5. $\lambda$ = 0 ist ein Eigenwert von $B$

![geometry](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_1192.png)

*Calculating Eigenvalues via determinant: The tweaked transformation squishes space into a lower dimension (Daher muss rang < n sein)*



*Rayleigh-Quotient & Satz von Courant-Fischer*

Der [Rayleigh-Quotient](https://de.m.wikipedia.org/wiki/Rayleigh-Quotient), auch Rayleigh-Koeffizient genannt, ist ein Objekt aus der linearen Algebra.

Der Rayleigh-Quotient wird insbesondere zur numerischen Berechnung von Eigenwerten einer quadratischen Matrix $A$ verwendet.

Sei $A \in {\mathbb K}^{n \times n}$ eine reelle symmetrische oder komplexe hermitesche Matrix und $x\in {\mathbb {K} }^{n}$ mit $x\neq 0$ ein Vektor, dann ist der Rayleigh-Quotient von $A$ zum Vektor $x$ definiert durch:

> $R_{A}(x)={\frac  {x^{*}Ax}{x^{*}x}}$

Der Rayleigh-Quotient hat eine enge Beziehung zu den Eigenwerten von $A$. Ist $v$ ein Eigenvektor der Matrix $A$ und $\lambda$  der zugehörige Eigenwert, dann gilt:

> $R_{A}(v)={\frac  {v^{*}Av}{v^{*}v}}={\frac  {v^{*}\lambda v}{v^{*}v}}=\lambda$

Durch den Rayleigh-Quotienten wird also jeder Eigenvektor von $A$ auf den dazugehörigen Eigenwert $\lambda$ abgebildet. Diese Eigenschaft wird unter anderem in der numerischen Berechnung von Eigenwerten benutzt.

Insbesondere gilt für eine symmetrische oder hermitesche Matrix $A$ mit dem kleinsten Eigenwert $\lambda _{{{\rm {min}}}}$ und dem größten Eigenwert $\lambda _{{{\rm {max}}}}$ nach dem [Satz von Courant-Fischer](https://de.m.wikipedia.org/wiki/Satz_von_Courant-Fischer) (Der Satz von Courant-Fischer stellt die Eigenwerte einer symmetrischen oder hermiteschen Matrix als minimale beziehungsweise maximale Rayleigh-Quotienten):

> $\lambda _{{{\rm {min}}}}\leq R_{A}(x)\leq \lambda _{{{\rm {max}}}}$

Die Berechnung des kleinsten bzw. größten Eigenwerts ist damit äquivalent zum Auffinden des Minimums bzw. Maximums des Rayleigh-Quotienten. Das lässt sich unter geeigneten Voraussetzungen auch noch auf den unendlichdimensionalen Fall verallgemeinern und ist als Rayleigh-Ritz-Prinzip bekannt.

*Der [Satz von Courant-Fischer](https://de.m.wikipedia.org/wiki/Satz_von_Courant-Fischer) charakterisiert die Eigenwerte einer symmetrischen positiv definiten (3 × 3)-Matrix über Extrempunkte auf einem Ellipsoid (Der Satz von Courant-Fischer charakterisiert nun die Eigenwerte von $A$ über bestimmte Extrempunkte auf diesem Ellipsoid) - Siehe auch [Rayleigh-Quotient](https://de.m.wikipedia.org/wiki/Rayleigh-Quotient):*

![ggg](https://upload.wikimedia.org/wikipedia/commons/thumb/e/ec/Ellipsoid_Quadric.png/434px-Ellipsoid_Quadric.png)



*Kernel*

**Kernel = Nullspace**

The [kernel of this linear map (linear algebra)](https://en.m.wikipedia.org/wiki/Kernel_(linear_algebra)) is the set of solutions to the equation Ax = 0, where 0 is understood as the zero vector.

In algebra, the [kernel (algebra)](https://en.m.wikipedia.org/wiki/Kernel_(algebra)) of a homomorphism (function that preserves the structure) is generally the inverse image of 0.

* The kernel of a homomorphism is reduced to 0 (or 1) if and only if the homomorphism is injective, that is if the inverse image of every element consists of a single element (jedes element im ziel hat nur ein element im ursprung, es kann aber mehr elemente im ziel ohne ein element im ursprung geben).

* This means that the kernel can be viewed as a measure of the degree to which the homomorphism fails to be injective.

![xxx](https://raw.githubusercontent.com/deltorobarba/repo/master/morphismus2.jpg)

![ggg](https://upload.wikimedia.org/wikipedia/commons/thumb/4/4c/KerIm_2015Joz_L2.png/320px-KerIm_2015Joz_L2.png)

der Kern einer linearen Abbildung f genau dann nichttrivial ist, wenn eine linear unabhängige Menge (bzw. genauer Familie) S existiert sodass f(S) linear abhängig ist. Durch Übergang zu den Negationen erhalten wir dann die äquivalente Aussage:

- kerf = {0}
- ...genau dann, wenn gilt...
- für alle linear unabhängigen S ist auch f(S) linear unabhängig.

https://www.youtube.com/watch?v=GNf3StvaiFA&list=WL&index=9



*Diagonalisierbarkeit*

**[Diagonalisierbarkeit](https://en.m.wikipedia.org/wiki/Diagonalizable_matrix) von Matrizen**:

* **Eigenwerte liegen auf der Diagonalen**. Dabei: Spur = Summer aller Eigenwerte. Determinante = Produkt aller Eigenwerte = 0.

* Square matrix $A$ into **invertible matrix** $P$ and **diagonal matrix** $D$ such that  ${\displaystyle P^{-1}AP=D}$. Approach: [Eigendecomposition](https://en.m.wikipedia.org/wiki/Eigendecomposition_of_a_matrix).

> $P^{-1} A P=\left[\begin{array}{cccc}
\lambda_1 & 0 & \cdots & 0 \\
0 & \lambda_2 & \cdots & 0 \\
\vdots & \vdots & \ddots & \vdots \\
0 & 0 & \cdots & \lambda_n
\end{array}\right]$

* Ein rotierender Körper ohne äußere Kräfte verbleibt in seiner Bewegung, wenn er um seine Symmetrieachse rotiert. Wenn eine Basis aus Eigenvektoren existiert, so ist die Darstellungsmatrix bezüglich dieser Basis eine Diagonalmatrix

* *The diagonalization of a symmetric matrix can be interpreted as a rotation of the axes to align them with the eigenvectors:*

![ggg](https://upload.wikimedia.org/wikipedia/commons/4/4e/Diagonalization_as_rotation.gif)




*Determinant (Permanent, Immanant)*

**[Determinante](https://de.m.wikipedia.org/wiki/Determinante)**:

* [Determinante](https://de.m.wikipedia.org/wiki/Determinante) (nur fur quadratischen Matrix - bei $n \cdot m$ Matrizen nutzt man zB SVD)

  * gibt an, wie sich Fläche bzw. Volumen durch linearen Abbildung ändert. det(A) = 4 $\rightarrow$ Matrix vervierfacht Flächeninhalt.

  * gibt an, ob lineares Gleichungssystem lösbar ist (Lösung dann mit der Cramerschen Regel)

  * Determinante ist Produkt aller Eigenwerte: $\prod_{i=1}^{n} \lambda_{i}=\operatorname{det}(A) = 0$

* Berechnung: $\operatorname{det}(A)$: $
A=\left(\begin{array}{ll}
a & c \\
b & d
\end{array}\right)
$ $\rightarrow$ $
\operatorname{det} A=\left|\begin{array}{ll}
a & c \\
b & d
\end{array}\right|=a d-b c
$. *Die 2x2-Determinante **ist gleich dem orientierten Flächeninhalt** des von ihren Spaltenvektoren aufgespannten Parallelogramms:*

![ggg](https://upload.wikimedia.org/wikipedia/commons/thumb/a/ad/Area_parallellogram_as_determinant.svg/220px-Area_parallellogram_as_determinant.svg.png)

* [Entwicklungssatz](https://de.m.wikipedia.org/wiki/Determinante#Laplacescher_Entwicklungssatz) oder  [Leibniz-Formel](https://de.m.wikipedia.org/wiki/Determinante#Leibniz-Formel) (geschlossene Form, theoretisch). [Gauß-Algorithmus](https://de.m.wikipedia.org/wiki/Gau%C3%9Fsches_Eliminationsverfahren) in Dreiecksform - Determinante ist Produkt der [Hauptdiagonale](https://de.m.wikipedia.org/wiki/Hauptdiagonale). Computeralgorithmus: [LU-Zerlegung](https://de.m.wikipedia.org/wiki/Gau%C3%9Fsches_Eliminationsverfahren#LR-Zerlegung).

Take this $2 \times 2$ matrix:

> $
A=\left[\begin{array}{ll}
1 & 0 \\
2 & 1
\end{array}\right]
$

Its [characteristic polynomial](https://en.m.wikipedia.org/wiki/Characteristic_polynomial) (=has the eigenvalues as roots, and it has the determinant and the trace / sum o fEigenvalues of the matrix among its coefficients):

> <font color="red">$f(\lambda) = \operatorname{det}(A) = 0$</font> $\quad (= \prod_{i=1}^{n} \lambda_{i})$

> $\begin{aligned} f(\lambda) & =\operatorname{det}\left(\lambda\left[\begin{array}{ll}1 & 0 \\ 0 & 1\end{array}\right]-\left[\begin{array}{ll}1 & 0 \\ 2 & 1\end{array}\right]\right) \\ & =\operatorname{det}\left(\left[\begin{array}{cc}\lambda-1 & 0 \\ -2 & \lambda-1\end{array}\right]\right) \\ & =(\lambda-1) \cdot(\lambda-1)-0 \cdot(-2) \\ & =(\lambda-1) \cdot(\lambda-1)\end{aligned}$

The roots of the polynomial, that is, <font color="red">**the solutions of $f(\lambda) = 0$** (determinant is equal to zero, Eigenvalues are "coefficients in characteristic polynomial" (it's trace to be precise))</font> are:

>$
\begin{aligned}
& \lambda_1=1 \\
& \lambda_2=1
\end{aligned}
$

https://en.wikipedia.org/wiki/Permanent_(mathematics)

https://en.wikipedia.org/wiki/Immanant

*Characteristics polynomial (Fundamental theorem of algebra)*

[Characteristics polynomial](https://en.m.wikipedia.org/wiki/Characteristic_polynomial) = has the Eigenvalues as [roots](https://en.m.wikipedia.org/wiki/Zero_of_a_function) (zero of a function)

Root (Zero of a function): The function f attains the value of 0 at x, or equivalently, x is the solution to the equation f(x) = 0. A root of a polynomial is a zero of the corresponding polynomial function. The [fundamental theorem of algebra](https://en.m.wikipedia.org/wiki/Fundamental_theorem_of_algebra) shows that any non-zero polynomial has a number of roots at most equal to its degree.

For example, the polynomial f of degree two, defined by $f(x)=x^{2}-5x+6$ has the two roots (or zeros) that are $\lambda_1=2$ and $\lambda_2=3$:

> ${\displaystyle f(2)=2^{2}-5\times 2+6=0{\text{ and }}f(3)=3^{2}-5\times 3+6=0.}$

<font color="red">Zero of a function is important because every equation in the unknown x may be rewritten as $f(x)=0$ by regrouping all the terms in the left-hand side. Hence the study of zeros of functions is exactly the same as the study of solutions of equations.</font>

Siehe [Nullstellen](https://de.m.wikipedia.org/wiki/Nullstelle) sind bei einer Funktion diejenigen Werte der Ausgangsmenge (des Definitionsbereichs D), bei denen das im Rahmen der Abbildung zugeordnete Element der Zielmenge (des Wertebereichs W) die Null ist (${\displaystyle 0\in W}$). Nullstellen von Polynomfunktionen werden auch als Wurzeln bezeichnet.

[Kern](https://de.m.wikipedia.org/wiki/Kern_(Algebra)) is Lösungsmenge der [homogenen linearen Gleichung](https://de.m.wikipedia.org/wiki/Lineare_Gleichung) f(x)=0 und wird hier auch Nullraum genannt (denjenigen Vektoren in V, die auf den Nullvektor in W abgebildet werden)

Example: A graph of the function $\cos (x)$ for $x$ in $[-2 \pi, 2 \pi]$, with zeros at $-\frac{3 \pi}{2},-\frac{\pi}{2}, \frac{\pi}{2}$, and $\frac{3 \pi}{2}$, marked in red.


![ggg](https://upload.wikimedia.org/wikipedia/commons/thumb/9/98/X-intercepts.svg/480px-X-intercepts.svg.png)



*Invertible Matrix*

* ($B^{-1}$ existiert nicht (Matrix nicht invertierbar) $\rightarrow$ dann existieren Eigenwerte)

* [Invertible matrix](https://en.m.wikipedia.org/wiki/Invertible_matrix) is used in methods to solve systems of linear equations and **help to get Eigenvalues**

> Apply Eigendecomposition to matrix (or other method) $\rightarrow$ Get matrix inverse (with Eigenvectors and a diagonal with Eigenvalues) $\rightarrow$ Solve systems of linear equations

* You can also use pseudo-inverse [Moore-Penrose](https://en.m.wikipedia.org/wiki/Moore–Penrose_inverse) with [matrix solution](https://en.m.wikipedia.org/wiki/System_of_linear_equations#Matrix_solution) to solve systems of linear equations or in [curve fitting](https://de.wikipedia.org/wiki/Ausgleichungsrechnung) (like regression or ML). Methods: QR, Cholesky, Rank decomposition, SVD. 'The pseudoinverse provides a [Linear Least Squares](https://en.m.wikipedia.org/wiki/Linear_least_squares) solution to a system of linear equations.'

  * Exkurs: [Linear least squares (LLS)](https://en.wikipedia.org/wiki/Linear_least_squares) for solving systems of linear equations: is the least squares approximation of linear functions to data (linear regression). **Numerical methods include inverting the matrix of the normal equations and orthogonal decomposition methods**.

  * **Overdetermined case**: A common use of the pseudoinverse is to compute a "best fit" (least squares) solution to a system of linear equations that lacks a solution. See under [Applications](https://en.m.wikipedia.org/wiki/Moore%E2%80%93Penrose_inverse#Applications)

  * **Underdetermined case**: Another use is to find the minimum (Euclidean) norm solution to a system of linear equations with multiple solutions. [Underdetermined system](https://en.m.wikipedia.org/wiki/Underdetermined_system): there are **fewer equations than unknowns**. Has an infinite number of solutions, if any. In optimization problems that are subject to linear equality constraints, only one of the solutions is relevant, namely the one giving the **highest or lowest value of an objective function**. The use of the **(Moore Pensore) pseudoinverse is to find the minimum (Euclidean) norm solution** to a system of linear equations with multiple solutions. Can be computed using the singular value decomposition. *Example: The solution set for two equations in three variables is, in general, a line ([Source](https://en.m.wikipedia.org/wiki/System_of_linear_equations#General_behavior)):*

  ![ggg](https://upload.wikimedia.org/wikipedia/commons/thumb/d/d6/Intersecting_Planes_2.svg/240px-Intersecting_Planes_2.svg.png)

* [Methods to compute inverse of matrix](https://en.m.wikipedia.org/wiki/Invertible_matrix#Methods_of_matrix_inversion): Gaussian elimination, Newton's method, Cayley–Hamilton method, Eigendecomposition, Cholesky decomposition, Analytic solution (Cramer's rule), Blockwise inversion.

* Example: If matrix A can be [eigendecomposed](https://en.m.wikipedia.org/wiki/Invertible_matrix), and if none of its eigenvalues are zero, then A is invertible and its inverse is given by

>${\displaystyle \mathbf {A} ^{-1}=\mathbf {Q} \mathbf {\Lambda } ^{-1}\mathbf {Q} ^{-1}}$

* where $\mathbf {Q}$  is the square ($N×N$) matrix whose i-th column is the **eigenvector** $q_{i}$ of $\mathbf {A}$
* ${\displaystyle \mathbf {\Lambda } }$ is the diagonal matrix whose diagonal elements are the corresponding **eigenvalues** ${\displaystyle \Lambda _{ii}=\lambda _{i}}$.

*Rank of a matrix*


The rank of a matrix is the number of linearly independent vectors (columns).

So **low rank is a low number** of linearly independent vectors. A low rank matrix can be this:

1

2

3


* a low rank matrix (whether approximation or not) is simply a matrix for which the number of linearly independent row or columns is much smaller than the actual number of rows or columns.

* Viewed as a linear transformation, the span of its range is small or the span of its null space is large.

* Quantum computers give exponential speedups for high rank matrices

*Exkurs: Sparse Matrix vs Low Rank Matrix*

Sparse matrices and low-rank matrices are two very different types of objects. The matrix

$\left[\begin{array}{llll}1 & 0 & 0 & 0 \\ 0 & 2 & 0 & 0 \\ 0 & 0 & 3 & 0 \\ 0 & 0 & 0 & 4\end{array}\right]$


is sparse (meaning it has a lot of zero entries) but not low-rank (as a matter of fact it’s full rank). On the other hand, the matrix

$\left[\begin{array}{cccc}1 & 2 & 3 & 4 \\ 2 & 4 & 6 & 8 \\ 3 & 6 & 9 & 12 \\ 4 & 8 & 12 & 16\end{array}\right]$

is low-rank but not sparse!

There's however a connection to be made between the two concepts: **a low-rank matrix has a sparse set of singular values**.

Take the following singular value decomposition for a general matrix $\mathrm{X}$

> $
\mathrm{X}=\mathrm{USV}^T
$

Then the number of non-zero entries in (the diagonal of) $S$ is precisely equal to the rank of $\mathbf{X}$. Thus, a sparse $\mathbf{S}$ leads to a low-rank $\mathbf{X}$ and vice-versa.


*Algebraic and geometric multiplicity of eigenvalues*

> Two or more distinct eigenvalues = algebraic multiplicity

> By how many linearly independent vectors is the Eigenspace of $\lambda_1$ (which is an Eigenvalue with multiplicity 1 or more) formed? = geometric multiplicity

> Geometric multiplicity is max equal or less than its algebraic multiplicity

> When the geometric multiplicity of a repeated eigenvalue is **strictly less** than its algebraic multiplicity, then that eigenvalue is said to be **defective**

https://en.wikipedia.org/wiki/Eigenvalues_and_eigenvectors#Eigenspaces,_geometric_multiplicity,_and_the_eigenbasis_for_matrices

https://en.m.wikipedia.org/wiki/Multiplicity_(mathematics)#Multiplicity_of_a_root_of_a_polynomial

*algebraic multiplicity*

The **algebraic multiplicity** of an eigenvalue is the number of times it appears as a root of the characteristic polynomial (i.e., the polynomial whose roots are the eigenvalues of a matrix).


Take this $2 \times 2$ matrix:

> $
A=\left[\begin{array}{ll}
1 & 0 \\
2 & 1
\end{array}\right]
$

Its characteristic polynomial is:

> $\begin{aligned} f(\lambda) & =\operatorname{det}\left(\lambda\left[\begin{array}{ll}1 & 0 \\ 0 & 1\end{array}\right]-\left[\begin{array}{ll}1 & 0 \\ 2 & 1\end{array}\right]\right) \\ & =\operatorname{det}\left(\left[\begin{array}{cc}\lambda-1 & 0 \\ -2 & \lambda-1\end{array}\right]\right) \\ & =(\lambda-1) \cdot(\lambda-1)-0 \cdot(-2) \\ & =(\lambda-1) \cdot(\lambda-1)\end{aligned}$

The roots of the polynomial, that is, the solutions of $f(\lambda) = 0$  are:

>$
\begin{aligned}
& \lambda_1=1 \\
& \lambda_2=1
\end{aligned}
$

Thus, $A$ has one repeated eigenvalue whose algebraic multiplicity is

>$
\mu\left(\lambda_1\right)=\mu\left(\lambda_2\right)=2
$

*geometric multiplicity*

The **geometric multiplicity** of an eigenvalue is the dimension of the linear space of its associated eigenvectors (i.e., its eigenspace).

If the Eigenspace of $\lambda_1$ is generated only by a single vector, it has dimension 1. As a consequence, the geometric multiplicity of $\lambda_1$ is 1, less than its algebraic multiplicity, which is equal to 2.

See complete example in [this pdf document](https://raw.githubusercontent.com/deltorobarba/repo/master/multiplicity.pdf).

*Condition Numbers*

https://blogs.mathworks.com/cleve/2017/07/17/what-is-the-condition-number-of-a-matrix/

What means low condition number in linear algebra?

The condition number of a matrix in linear algebra is a measure of the matrix's sensitivity to numerical errors. This number is used in the context of numerical analysis to understand the stability of a problem, specifically in the context of linear systems of equations.

The condition number is calculated as the ratio of the largest singular value of a matrix to the smallest singular value. In other words, it gives us a measure of how much the output value of the function can change for a small change in the input.

If a matrix has a low condition number, it means the matrix is well-conditioned, implying that the system of equations has a unique solution, and this solution can be found accurately by numerical methods. The solution will not significantly change even if the input is perturbed slightly.

On the other hand, if the condition number is high, the matrix is said to be ill-conditioned. This indicates that the system is sensitive to small changes in the input, and even a slight change in the input could produce a large change in the result. Thus, the solution may not be accurately computable due to rounding errors and other numerical errors.

So a low condition number in linear algebra is generally a good thing, as it indicates a stable and accurate system.

Here's an example of a well-conditioned matrix, meaning it has a low condition number.

Let's consider a simple 2x2 matrix A:

    A = [4 2]
        [2 1]

The singular values of a matrix are the square roots of the eigenvalues of the matrix A^T*A. For matrix A, we can compute A^T*A and find its eigenvalues.

    A^T * A = [20 10]
              [10  5]

The eigenvalues of this matrix are the solutions to the characteristic equation:

    det(A - λI) = 0

For the above 2x2 matrix, this gives the equation

    λ^2 - λ*25 + 100 = 0

Solving this equation gives us the eigenvalues 0 and 25. Therefore, the singular values of matrix A are sqrt(0) = 0 and sqrt(25) = 5.

The condition number of the matrix is the ratio of the maximum singular value to the minimum singular value (ignoring zero). Therefore, in this case, the condition number of the matrix A is 5/0 = infinity. This suggests that the matrix is ill-conditioned, which means small perturbations in the input could lead to large changes in the output.

But let's consider another matrix B:

    B = [2 1]
        [1 2]

The singular values of matrix B are approximately 0.68 and 3.62. Hence, the condition number of matrix B is 3.62 / 0.68 ≈ 5.3, which is considerably low. So this matrix is well-conditioned, and small perturbations in the input will only cause small changes in the output.

Please note that in practice, you'd typically use a numerical library or software to compute singular values and condition numbers of matrices, especially for larger and more complex matrices.

*Where is the border between low and high condition number (which value)?*

There's no universally accepted threshold for when a condition number is considered "high" or "low" as it can often depend on the context or specific application. That being said, some guidelines are typically used.

Generally, a matrix with a condition number close to 1 is considered to be well-conditioned, which means that it is not sensitive to changes in the input values. As the condition number increases, the matrix becomes increasingly ill-conditioned.

As a rough rule of thumb, if the condition number κ(A) exceeds 1/ε, where ε is the machine precision (around 10^-16 for double-precision floating-point arithmetic), then the matrix is considered ill-conditioned. This is because the precision of the numerical computations might be less than the number of digits in the condition number. For example, a condition number of 10^12 would indicate that up to 12 digits of precision could be lost in the computation.

In practice, condition numbers in the range of 1-10 are generally considered good, 10-100 might be cause for some concern, and 100+ is usually considered problematic, although these thresholds are rather arbitrary and can depend heavily on the specific context and precision of the computations.

However, these are general guidelines and can vary significantly depending on the context and the specific application. Therefore, it's crucial to interpret the condition number in the context of the specific problem you're trying to solve.

*Get Eigenvalues: **Algorithms** (Power & Inverse Iteration, Charakteristisches Polynom) & **Matrix Decomposition** (Eigendecomposition/Spectrum, Schur, SVD)*

> [Eigenvalue algorithm](https://en.m.wikipedia.org/wiki/Eigenvalue_algorithm) - matrices are diagonalized numerically using computer software. See [List of Eigenvalue algorithms](https://en.m.wikipedia.org/wiki/List_of_numerical_analysis_topics#Eigenvalue_algorithms).

**Small Matrices: Ermittlung der Eigenwerte der Matrix $A$ mit Determinante und charakteristischem Polynom**

In practice, eigenvalues of large matrices are not computed using the characteristic polynomial [Source](https://en.wikipedia.org/wiki/Eigendecomposition_of_a_matrix#Numerical_computations)

> $A=\left(\begin{array}{lll}2 & 1 & 2 \\ 1 & 2 & 2 \\ 1 & 1 & 3\end{array}\right)$

**Step 1: Bilde mit der Einheitsmatrix $E_{n}$ die Matrix $\left(A-\lambda E_{n}\right)$**

> $\left(A-\lambda E_{n}\right)=\left(\begin{array}{lll}2 & 1 & 2 \\ 1 & 2 & 2 \\ 1 & 1 & 3\end{array}\right)-\left(\begin{array}{ccc}\lambda & 0 & 0 \\ 0 & \lambda & 0 \\ 0 & 0 & \lambda\end{array}\right)=\left(\begin{array}{ccc}2-\lambda & 1 & 2 \\ 1 & 2-\lambda & 2 \\ 1 & 1 & 3-\lambda\end{array}\right)$

**Step 2: Berechne die Determinante von $\operatorname{det}\left(A-\lambda E_{n}\right) = \chi_{A}(\lambda)$ $\rightarrow$ [charakteristisches Polynom](https://de.m.wikipedia.org/wiki/Charakteristisches_Polynom)**

> $\operatorname{det}\left(A-\lambda E_{n}\right)=\operatorname{det}\left(\begin{array}{ccc}2-\lambda & 1 & 2 \\ 1 & 2-\lambda & 2 \\ 1 & 1 & 3-\lambda\end{array}\right)$

$=(2-\lambda)^{2} \cdot(3-\lambda)+2+2-2 \cdot(2-\lambda)-2 \cdot(2-\lambda)-(3-\lambda)$
$=-\lambda^{3}+7 \lambda^{2}-11 \lambda+5$ $\quad$ (= Polynom)

**Step 3: Bestimme die Nullstellen des charakteristischen Polynoms, weil das sind die Eigenwerte der Matrix $A$ (und Determinante wird Null)**

> $(A-\lambda E_{n}) \cdot v=0$

* Durch Ausprobieren: erste Nullstelle $\lambda_{1}=1$.

* Klammern wir dann den Faktor $(\lambda-1)$ aus, erhalten wir:
$-\lambda^{3}+7 \lambda^{2}-11 \lambda+5=(\lambda-1) \cdot\left(-\lambda^{2}+6 \lambda-5\right)
$.

* Anwendung der [Mitternachtsformel](https://de.m.wikipedia.org/wiki/Quadratische_Gleichung#Lösungsformel_für_die_allgemeine_quadratische_Gleichung_(a-b-c-Formel)): $
\lambda_{2,3}=\frac{-6 \pm \sqrt{36-20}}{-2}=3 \mp 2$

* Somit lauten die drei Eigenwerte der Matrix $\lambda_{1}=\lambda_{2}=1$ sowie $\lambda_{3}=5$

*Gibt es eine Zahl $\lambda$ und einen Vektor $v$, sodass dieser durch Multiplikation mit der Matrix $\left(A-\lambda E_{n}\right)$ auf den Nullvektor abgebildet wird, so ist diese Matrix nicht von vollem Rang. Das bedeutet, dass ihre Determinante Null ist. Vektoren wie $v$ und $w$ sind linear abhängig.*


**Eigendecomposition (spectral decomposition)**

* [Eigendecomposition](https://en.m.wikipedia.org/wiki/Eigendecomposition_of_a_matrix) or sometimes spectral decomposition, see [spectral theorem](https://en.m.wikipedia.org/wiki/Spectral_theorem), is the factorization of a matrix into a canonical form (Normalform) - the matrix is represented in terms of its eigenvalues and eigenvectors. Matrix must be diagonalizable. See also [Summary of Eigendecomposition](https://en.m.wikipedia.org/wiki/Matrix_decomposition#Decompositions_based_on_eigenvalues_and_related_concepts).

> ${\displaystyle A=VDV^{-1}}A=VDV^{{-1}}$

* $D$ = eigenvalues of $A$ (diagonal)
* $V$ = eigenvectors of $A$

https://numpy.org/doc/stable/reference/generated/numpy.linalg.eig.html

**Schur decomposition**

* [Schur decomposition](https://en.m.wikipedia.org/wiki/Schur_decomposition) is a [matrix decomposition](https://en.m.wikipedia.org/wiki/Matrix_decomposition).

* Take complex square matrix and **get an upper triangular matrix whose diagonal elements are the eigenvalues** of the original matrix.

**Singular Value Decomposition**

* [Singular Value Decomposition](https://de.m.wikipedia.org/wiki/Singulärwertzerlegung) is used in calculation of other matrix operations, such as matrix inverse, but also as a data reduction method in machine learning.

* SVD can also be used in least squares linear regression, image compression, and denoising data. Siehe auch [Spektralnorm](https://de.m.wikipedia.org/wiki/Spektralnorm)





*Solving Systems of Linear Equations: **Matrix Decomposition** (Gaussian Elimination, Cholesky, QR, LU)*

Classical: [Gaussian Elimination](https://en.m.wikipedia.org/wiki/Gaussian_elimination) (row reduction). See [List of numerical algorithms to solve systems of linear equation](https://en.m.wikipedia.org/wiki/List_of_numerical_analysis_topics#Solving_systems_of_linear_equations)

**Cholesky Decomposition** (Hermitian / squared)

* **alternative to Eigendecomposition** to get matrix inverse for solving linear equations

* [Cholesky decomposition](https://en.m.wikipedia.org/wiki/Cholesky_decomposition) of a Hermitian, positive-definite matrix into the product of a lower triangular matrix and its conjugate transpose,

* useful for efficient numerical solutions, e.g., Monte Carlo simulations (solving linear least squares for linear regression, as well as simulation and optimization methods)

* When it is applicable, the Cholesky decomposition is roughly twice as efficient as the LU decomposition for solving systems of linear equations.

> $A = LL^T$

* $L$ is the lower triangular matrix and $L^T$ is the transpose of L. Decompose as the product of upper triangular matrix $U$ is also possible

**QR Decomposition**

* We can use QR decomposition to [find the determinant of a square matrix](https://en.m.wikipedia.org/wiki/QR_decomposition#Connection_to_a_determinant_or_a_product_of_eigenvalues)

* The [QR decomposition](https://en.m.wikipedia.org/wiki/QR_decomposition) is for m x n matrices (not limited to square matrices) and decomposes a matrix into $Q$ (orthogonal $Q^T Q = I$ or unitary $Q \cdot Q = I$, size m x m) and $R$ (upper triangle matrix with the size m x n) components.

> $A = Q R$

* Like the LU decomposition, the QR decomposition is often used to solve systems of linear equations, **although is <u>not</u> limited to square matrices**.

* There are several methods for actually computing the QR decomposition, such as by [Householder transformations](https://de.m.wikipedia.org/wiki/Householdertransformation), [Givens rotations](https://de.m.wikipedia.org/wiki/Givens-Rotation) and  [Gram-Schmidtsch's orthogonalization method](https://de.m.wikipedia.org/wiki/Gram-Schmidtsches_Orthogonalisierungsverfahren)

**LU Decomposition**

* The [LU decomposition](https://en.m.wikipedia.org/wiki/LU_decomposition) is often used to simplify the **solving of systems of linear equations**, such as **finding the coefficients in a linear regression**, as well as in **calculating the determinant and inverse** of a matrix.

* Lower–upper (LU) decomposition or factorization factors a matrix as the product of a lower triangular matrix and an upper triangular matrix.

* The product sometimes includes a permutation matrix as well. LU decomposition can be viewed as the matrix form of Gaussian elimination.

* Computers usually solve square systems of linear equations using LU decomposition, and it is also a key step when inverting a matrix or computing the determinant of a matrix.

The **LU decomposition is for square matrices** and decomposes a matrix into L and U components. Let A be a square matrix. An LU factorization refers to the factorization of A, with proper row and/or column orderings or permutations, into two factors – a **lower triangular matrix L** and an **upper triangular matrix U**:

> A = L U

* The LU decomposition is found using an <u>iterative numerical process</u> and **can fail for those matrices that cannot be decomposed or decomposed easily**.

* In the lower triangular matrix all elements above the diagonal are zero, in the upper triangular matrix, all the elements below the diagonal are zero. For example, for a 3 × 3 matrix A, its LU decomposition looks like this:

> $\left[\begin{array}{lll}
a_{11} & a_{12} & a_{13} \\
a_{21} & a_{22} & a_{23} \\
a_{31} & a_{32} & a_{33}
\end{array}\right]=\left[\begin{array}{ccc}
l_{11} & 0 & 0 \\
l_{21} & l_{22} & 0 \\
l_{31} & l_{32} & l_{33}
\end{array}\right]\left[\begin{array}{ccc}
u_{11} & u_{12} & u_{13} \\
0 & u_{22} & u_{23} \\
0 & 0 & u_{33}
\end{array}\right]$

*Solving Systems of Linear Equations: **Matrix Type** (Squared & Non-Squared)*

**Part I: Solving Systems of Linear Equations: <u>Squared Matrix</u> (for HHL) - Eigendecomposition to get Pseudo-inverse**

> https://towardsdatascience.com/from-eigendecomposition-to-determinant-fundamental-mathematics-for-machine-learning-with-1b6b449a82c6


<font color="red">**If A is squared is a matrix (and has [full rank](https://de.m.wikipedia.org/wiki/Rang_(Mathematik))) in a linear system of equations: Getting the matrix inverse via Eigendecomposition (pseudoinverse Moore-Penrose provides a least squares solution to a system of linear equations.')**

> $A x = b$

then you can take the [inverse](https://de.m.wikipedia.org/wiki/Inverse_Matrix) if A to solve for x:

> $x = A^{-1} b$

<font color="red">*This part is important for HHL:*

* $\hat{A} = \hat{A}^{\dagger}$ $\quad$ - Hermitian operators are [Self-adjoint operators](https://en.m.wikipedia.org/wiki/Self-adjoint_operator)

* Adjungierte Matrix = transponiert + complex konjugiert (Vorzeichen umgekehrt). Hermetian: selbstadjunktiert = symmetrisch

* Following from this, in bra-ket notation: $
\left\langle\phi_{i}|\hat{A}| \phi_{j}\right\rangle=\left\langle\phi_{j}|\hat{A}| \phi_{i}\right\rangle^{*}
$


<font color="red">*Since $A$ is Hermitian, it has a spectral decomposition : $
A=\sum_{j=0}^{N-1} \lambda_{j}\left|u_{j}\right\rangle\left\langle u_{j}\right|, \quad \lambda_{j} \in \mathbb{R}
$*

<font color="red">*You need the Eigendecomposition (spectral decomposition) to get the inverse of a matrix (=here unitary and hence normal)*

Getting the [Matrix inverse via eigendecomposition](https://en.m.wikipedia.org/wiki/Eigendecomposition_of_a_matrix#Matrix_inverse_via_eigendecomposition): If a matrix $\mathbf{A}$ can be eigendecomposed and if none of its eigenvalues are zero, then $\mathbf{A}$ is invertible and its inverse is given by

>$
\mathbf{A}^{-1}=\mathbf{Q}^{-1} \mathbf{\Lambda}^{-1} \mathbf{Q}
$

If $\mathbf{A}$ is a symmetric matrix, since $\mathbf{Q}$ is formed from the eigenvectors of $\mathbf{A}, \mathbf{Q}$ is guaranteed to be an orthogonal matrix, therefore $\mathbf{Q}^{-1}=\mathbf{Q}^{\mathrm{T}}$. Furthermore, because $\mathbf{\Lambda}$ is a diagonal matrix, its inverse is easy to calculate:

>$
\left[\Lambda^{-1}\right]_{i i}=\frac{1}{\lambda_{i}}
$

**Example - the matrix:**

$A=\left[\begin{array}{ll}2 & 3 \\ 2 & 1\end{array}\right]$

has the eigenvectors:

$\mathbf{u}_{1}=\left[\begin{array}{l}3 \\ 2\end{array}\right] \quad$ with eigenvalue $\quad \lambda_{1}=4$

and:

$\mathbf{u}_{2}=\left[\begin{array}{r}-1 \\ 1\end{array}\right] \quad$ with eigenvalue $\quad \lambda_{2}=-1$

We can verify (as illustrated in Figure 1) that only the length of $\mathbf{u}_{1}$ and $\mathbf{u}_{2}$ is changed when one of these two vectors is multiplied by the matrix $\mathbf{A}$ :


$\left[\begin{array}{ll}2 & 3 \\ 2 & 1\end{array}\right]\left[\begin{array}{l}3 \\ 2\end{array}\right]=4\left[\begin{array}{l}3 \\ 2\end{array}\right]=\left[\begin{array}{c}12 \\ 8\end{array}\right]$

and

$\left[\begin{array}{ll}2 & 3 \\ 2 & 1\end{array}\right]\left[\begin{array}{r}-1 \\ 1\end{array}\right]=-1\left[\begin{array}{r}-1 \\ 1\end{array}\right]=\left[\begin{array}{r}1 \\ -1\end{array}\right]$

For most applications we normalize the eigenvectors (i.e. transform them such that their length is equal to one):

$
\mathbf{u}^{\top} \mathbf{u}=1 \text {. }
$

For the previous example we obtain:

$
\mathbf{u}_{1}=\left[\begin{array}{l}
.8331 \\
.5547
\end{array}\right]
$

Exkurs: wie man einen Vektor normiert:

1) Betrag von $\left(\begin{array}{l}3 \\ 2\end{array}\right)$ ist gleich $\sqrt{3^{2}+2^{2}}=3.6055$ Vektor normieren, also mit 1/Betrag malnehmen:

2) $
\frac{1}{3.6055} \cdot\left(\begin{array}{l}
3 \\
2
\end{array}\right)= 0.27735 \cdot\left(\begin{array}{l}
3 \\
2
\end{array}\right) =\left(\begin{array}{l}
.8331 \\
.5547
\end{array}\right)
$


We can check that:

$
\left[\begin{array}{ll}
2 & 3 \\
2 & 1
\end{array}\right]\left[\begin{array}{l}
.8331 \\
.5547
\end{array}\right]=\left[\begin{array}{l}
3.3284 \\
2.2188
\end{array}\right]=4\left[\begin{array}{l}
.8331 \\
.5547
\end{array}\right]
$

and

$
\left[\begin{array}{ll}
2 & 3 \\
2 & 1
\end{array}\right]\left[\begin{array}{r}
-.7071 \\
.7071
\end{array}\right]=\left[\begin{array}{r}
.7071 \\
-.7071
\end{array}\right]=-1\left[\begin{array}{r}
-.7071 \\
.7071
\end{array}\right]
$




**Part II: Solving Systems of Linear Equations: <u>Non-Squared Matrix</u> - Singular value decomposition (SVD) to get Pseudo-inverse**

<font color="red">**If a matrix A is not squared: singular value decomposition (SVD) is a factorization of a real or complex matrix. It generalizes the eigendecomposition of a square normal matrix with an orthonormal eigenbasis to any $m\times n$ matrix. It is related to the polar decomposition.**

> $A x = b$ $\quad$ ($A$ is not regular)

You multiply both sides by the $A^{T}$, which is the transpose of A:

> $A^{T} A x = A^{T} b$

Then move $A^{T} A$ on right side (by taking their inverse). This is not the original x anymore, but the [least squares](https://de.m.wikipedia.org/wiki/Methode_der_kleinsten_Quadrate#Lineare_Modellfunktion) $\hat{x}$

> $\hat{x} =$ <font color="blue">$(A^{T} A)^{-1} A^{T}$</font> $b$

And that term <font color="blue">$(A^{T} A)^{-1} A^{T}$</font> is know as (Moore–Penrose) pseudo-inverse <font color="blue">$A^{+}$</font>:

> $\hat{x} =$ <font color="blue">$A^{+}$</font> $b$

And a pseudo-inverse is nothing else than our least-squares solutions. See more details under [Numerical methods for linear least squares](https://en.m.wikipedia.org/wiki/Numerical_methods_for_linear_least_squares)

*In linear algebra, the singular value decomposition (SVD) is a factorization of a real or complex matrix. It generalizes the eigendecomposition of a square normal matrix with an orthonormal eigenbasis to any $m\times n$ matrix. It is related to the polar decomposition.*

* non squared matrix

* to solve you need the inverse, but normal eigendecomposition doesn work

> you need to compute mooore penrose pseudo-inverse, and here you need to apply singular value decomposition to get eigendecomposition

* SVD is a type of [Matrix decomposition](https://en.m.wikipedia.org/wiki/Matrix_decomposition), specicially one based on eigenvalue concepts. and SVD is a full rank decomposition (besides broader [Rank factorization methods](https://en.m.wikipedia.org/wiki/Rank_factorization))

* matrix decompositions in general are used to take a large matrix apart (i.e. factorizes a matrix into a lower triangular matrix L and an upper triangular matrix U) so the new matrices require fewer additions and multiplications to solve, compared with the original system $A\mathbf {x} =\mathbf {b}$

<font color="red">**Example:**

> $A x = b$

> $\left[\begin{array}{cc}2 & -2 \\ -2 & 2 \\ 5 & 3\end{array}\right] x=\left[\begin{array}{c}-1 \\ 7 \\ -26\end{array}\right]$

* We can immediately see that matrix A is of rank 2 because the first two rows are multiples of each other (2 and -2 and -2 and 2), just the last row numbers are not multiples of each other (5 and 3).

* Now we can apply the pseudo-inverse to find the least-squares solution:

> $\hat{x} =$ <font color="blue">$(A^{T} A)^{-1} A^{T}$</font> $b$

> $\hat{x}=$ <font color="blue">$\left(\left[\begin{array}{ccc}2 & -2 & 5 \\ -2 & 2 & 3\end{array}\right]\left[\begin{array}{cc}2 & -2 \\ -2 & 2 \\ 5 & 3\end{array}\right]\right)^{-1}\left[\begin{array}{ccc}2 & -2 & 5 \\ -2 & 2 & 3\end{array}\right]$</font>$\left[\begin{array}{c}-1 \\ 7 \\ -26\end{array}\right]$

> $\hat{x}=$$\left[\begin{array}{l}-4 \\ -2\end{array}\right]$


*Inserting the least-squares $\hat{x}$ into the original equation:*

> $\left[\begin{array}{cc}2 & -2 \\ -2 & 2 \\ 5 & 3\end{array}\right]\left[\begin{array}{l}-4 \\ -3\end{array}\right]$ = $\left[\begin{array}{c}2 \cdot(-4)+(-2) \cdot(-3) \\ (-2)\cdot (-4)+2 \cdot (-3) \\ 5\cdot (-4)+3 \cdot (-3)\end{array}\right]$ = $\left[\begin{array}{c}-2 \\ 2 \\ -29\end{array}\right]$ $\approx$ $\left[\begin{array}{c}-1 \\ 7 \\ -26\end{array}\right]$

###### *Graph Theory*

**what is a combinatorial Laplacian?**


The combinatorial Laplacian (often simply referred to as the Laplacian) is a matrix associated with a graph, which captures certain structural properties of the graph. Specifically, it's the matrix difference between the degree matrix and the adjacency matrix of the graph. Here's how to construct it:

1. **Adjacency Matrix (A)**:
For a graph \( G \) with \( n \) vertices, its adjacency matrix \( A \) is an \( n \times n \) matrix where:
\[ A_{ij} =
  \begin{cases}
   1 & \text{if vertex } i \text{ is adjacent to vertex } j \\
   0 & \text{otherwise}
  \end{cases}
\]

2. **Degree Matrix (D)**:
The degree matrix is a diagonal matrix where the diagonal entries correspond to the degrees of the vertices. If \( d_i \) is the degree of the \( i \)-th vertex, then:
\[ D_{ij} =
  \begin{cases}
   d_i & \text{if } i = j \\
   0 & \text{otherwise}
  \end{cases}
\]

3. **Combinatorial Laplacian (L)**:
The combinatorial Laplacian is then defined as:
\[ L = D - A \]

So, the entry \( L_{ij} \) of the Laplacian is:
\[ L_{ij} =
  \begin{cases}
   d_i & \text{if } i = j \\
   -1 & \text{if vertex } i \text{ is adjacent to vertex } j \\
   0 & \text{otherwise}
  \end{cases}
\]

The combinatorial Laplacian has various properties and applications in graph theory, spectral graph theory, network analysis, and other areas. For example, the smallest eigenvalue of the Laplacian is always zero, and its multiplicity is related to the number of connected components in the graph. Another important application is in graph spectral clustering, where the eigenvalues and eigenvectors of the Laplacian are used to cluster the vertices of a graph.

Yes, graph theory plays a significant role in quantum computing and quantum information theory. Here are a few ways in which it's important:

1. **Quantum walks**: Quantum walks are the quantum analogs of classical random walks. In a classical random walk, a "walker" moves randomly around a graph. Quantum walks extend this idea to the quantum realm, and their properties have been used to design new quantum algorithms. The structure of the graph can greatly impact the behavior of the quantum walk.

2. **Quantum error correction**: Graph-based quantum error correction codes, such as the Toric code and surface code, are critical for protecting quantum information from errors due to environmental noise. These codes can be represented using mathematical objects from graph theory.

3. **Graph states**: In quantum computing, graph states are a particular kind of multi-qubit state which is associated with a graph. They are used in quantum error correction, quantum communication, and quantum computing. For instance, measurement-based quantum computation often makes use of graph states.

4. **Quantum algorithms**: Certain quantum algorithms, like the Quantum Search Algorithm (a generalized version of Grover's Algorithm), can be understood in terms of graph theory. They can be interpreted as searching a graph for a marked node.

5. **Network Theory and Quantum Entanglement**: Network theory, which is a part of graph theory, can be used to understand and quantify entanglement in quantum systems. This is particularly important in understanding quantum many-body systems and the concept of quantum teleportation.

So, while not all aspects of quantum computing require knowledge of graph theory, many of the key concepts and methods are deeply interconnected with it.


https://www.youtube.com/watch?v=M9F7zT9Gg-k

https://www.youtube.com/watch?v=skaXrTtQyJA

https://www.youtube.com/watch?v=2h12m-3zQ0M

https://www.youtube.com/watch?v=-Afa1WI3iug

**Spectral Graph Theory**

* In [spectral graph theory](https://en.m.wikipedia.org/wiki/Spectral_graph_theory), the [characteristic polynomial](https://en.m.wikipedia.org/wiki/Characteristic_polynomial) of a [graph](https://en.m.wikipedia.org/wiki/Graph_(discrete_mathematics)) is the characteristic polynomial of its [adjacency matrix](https://en.m.wikipedia.org/wiki/Adjacency_matrix).

* Spectral graph theory is the study of the properties of a graph in relationship to the **characteristic polynomial, eigenvalues, and eigenvectors of matrices associated with the graph**, such as its adjacency matrix or [Laplacian matrix](https://en.m.wikipedia.org/wiki/Laplacian_matrix) = Edge Matrix - Adjacency Matrix

* One of the reasons that the eigenvalues of matrices have meaning is that they arise as the solution to natural optimization problems

> Der größte Eigenwert eines k-regulären Graphen ist k (Satz von Frobenius), seine Vielfachheit ist die Anzahl der Zusammenhangskomponenten des Graphen. (The largest eigenvalue of a k-regular graph is k (Frobenius' theorem), **its multiplicity is the number of connected components of the graph**.)



http://cs-www.cs.yale.edu/homes/spielman/sagt/sagt.pdf

https://www.cs.cmu.edu/~venkatg/teaching/15252-sp20/notes/Spectral-graph-theory.pdf

**Laplacian matrix:**

> $L(G) = D - A = B^T B \quad$ for some matrix $B$

* $L$ = [Laplacian matrix](https://en.m.wikipedia.org/wiki/Laplacian_matrix)
* $D$ = [degree matrix](https://en.m.wikipedia.org/wiki/Degree_matrix), entries contain degrees of each vertex
* $A$ = [adjacency matrix](https://en.m.wikipedia.org/wiki/Adjacency_matrix): 1 if edge, 0 if no edge
* $B$ = the „directed“ node-edge [incidence matrix](https://en.m.wikipedia.org/wiki/Incidence_matrix) of G

**Description of Laplacian matrix**
* Laplacian matrix is an operator mapping from graph to the real numbers
* Diagonals have degrees of vertices
* Off-diagonal contain edges marked as -1
* Contains all information about graph (you can recreate the graph just from laplacian).
* And you can nicely apply linear algebra to analyse the graph

**Properties of Laplacian matrix:**
1. L(G) is symmetric
2. L(G) has:
  * real-valued, non-negative eigenvalues and
  * real-valued, orthogonal eigenvectors
* Is always positive-semidefinite matrix
* $\lambda$ = 0 is always an Eigenvalue with Eigenvector 1 (because all rows add to zero and multiplied by vector with 1‘s leads to zero)


**Benefits of calculating Eigenvalues and Eigenvectors of Laplacian matrix**
* second smallest Eigenvalue (**spectral gap**) is a solution (i.e. minimal energy) for many problems, if computable.
* You can use the spectrum of the Laplacian matrix to solve
  * **optimization problems** (i.e. shortest path, Hamiltonain circuit), like traveling salesman problem
  * **graph partitioning problem** (optimal cut). reveal sparse connections and where there are more clusters, find bottlenecks (in social media it relates to communities, in networks it's a critical point
  * both problems are np-hard
* In physics: **Fundamental modes (harmonics)** are given by the Eigenvectors of the graph laplacian [Spectral Partitioning Part 3 Algebraic Connectivity](https://www.youtube.com/watch?v=Vng9lkibGEE)

**Special 1: Graph laplacian spectrum, meaning it's eigenvalues, tells us something about the underlying connectivity of the graph**
* = Graph has k connected components if and only if the k-smallest Eigenvalues are identically zero: $\lambda_0$ = $\lambda_1$ = .. $\lambda_{k-1}$ = 0 (Fiedler 1973).
* Spectrum of L(G) $\rightleftharpoons$ Connectivity of G
* = The number of (connected) components in the graph is the dimension of the nullspace (=kernel / set of solutions) of the Laplacian and the algebraic multiplicity (=how many times the same Eigenvalue) of the 0 eigenvalue.
* Laplacian: its Eigenvectors, which do not rotate, you can see an ellipse like section of space they draw.
* **Spectrograph theory asked questions about this ellipse to determine graph behavior, what the fastest way from dot-to-dot is**, how many pathways there are dot to dot [Source](https://www.youtube.com/watch?v=njatNunHC_o)

*Der [Satz von Courant-Fischer](https://de.m.wikipedia.org/wiki/Satz_von_Courant-Fischer) charakterisiert die Eigenwerte einer symmetrischen positiv definiten (3 × 3)-Matrix über Extrempunkte auf einem **Ellipsoid** (Der Satz von Courant-Fischer charakterisiert nun die Eigenwerte von $A$ über bestimmte Extrempunkte auf diesem Ellipsoid) - Siehe auch [Rayleigh-Quotient](https://de.m.wikipedia.org/wiki/Rayleigh-Quotient):*

![ggg](https://upload.wikimedia.org/wikipedia/commons/thumb/e/ec/Ellipsoid_Quadric.png/434px-Ellipsoid_Quadric.png)

**Special 2: $L(G)$ contains $B^T B$ for some matrix $B$**
* the „directed“ node-edge [incidence matrix](https://en.m.wikipedia.org/wiki/Incidence_matrix) of G

> $\min _{y \in \Re^{\prime \prime}} f(y)=\sum_{(i, j) \in E^{\prime}}\left(y_i-y_j\right)^2=y^T L y$

* [Rayleigh-Quotient](https://de.m.wikipedia.org/wiki/Rayleigh-Quotient) with Rayleigh theorem:
  * $\lambda_2=\min f(y)$ : The minimum value of $f(y)$ is given by the $2^{\text {nd }}$ smallest eigenvalue $\lambda_2$ of the Laplacian matrix $\boldsymbol{L}$
  * $\mathrm{x}=\arg \min _{\mathrm{y}} \boldsymbol{f}(\boldsymbol{y})$ : The optimal solution for $y$ is given by the corresponding eigenvector $\boldsymbol{x}$, referred as the [Fiedler vector (Algebraic_connectivity)](https://en.m.wikipedia.org/wiki/Algebraic_connectivity)

* see how to create Laplacian matrix from incidence matrix: [Spectral Partitioning, Part 1 The Graph Laplacian](https://www.youtube.com/watch?v=rVnOANM0oJE)
* B captures relationship between different nodes and edges, rows are nodes and columns are edges of G (pair start-sink)

![geometry](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_1390.png)

Source: [(Lecture 14) Graph Laplacians](https://www.youtube.com/watch?v=M9F7zT9Gg-kI)

**How can you use this property? - The number of cut edges equals $\frac{1}{4} x^T L(G) x$ under specific constraints**.
  * Means: if you want to minimize edge cuts, you should try minimizing this product (it’s a combinatorial optimization problem).
  * Cut edges: find minimal number of connections, bottleneck, borders between two clusters.
  * This problem is however np complete. You need to relax sum contraints.

*If you relax it you can combine it with the [Courant-Fisher minimax theorem (Satz von Courant-Fischer)](https://de.m.wikipedia.org/wiki/Satz_von_Courant-Fischer):*
* if we allowed to use any vector y, where y is normalized in a certain way, and it’s elements sum to 0, then the vector y that minimizes this quantity is actually q1.
* And q1 is the Eigenvector corresponding to the second smallest Eigenvalue.
* And in fact the minimum value simplifies to something that is proportional to that eigenvalue.

![geometry](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_1388.png)

![geometry](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_1387.png)

**Application example: Problem statement for using the Laplacian matrix of a graph:**

![geometry](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_1384.png)

Source: [Lecture 30 — The Graph Laplacian Matrix (Advanced) | Stanford University](https://www.youtube.com/watch?v=FRZvgNvALJ4)


Sum over all neighbouring edges:

![geometry](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_1385.png)


**Solution siehe** [Rayleigh-Quotient](https://de.m.wikipedia.org/wiki/Rayleigh-Quotient), um den zweitkleinsten Eigenwert der Laplacian matrix zu finden (spectral gap):

> $(R_{A}(x)=) \quad \lambda_2 = {min \frac {x^{T} M x}{x^{T}x}}$ with matrix $M$ being the Laplacian matrix $L$

Meaning of (min $x^T L x$): I take the label of one endpoint edge, subtract the value of the other endpoint of an edge, then square it up, and sum this over all the edges. (see also in green on the bottom why quadratic forms are relevant here!)

![geometry](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_1381.png)

Source: [Lecture 33 — Spectral Graph Partitioning Finding a Partition (Advanced) | Stanford](https://www.youtube.com/watch?v=siCPjpUtE0A)

![geometry](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_1382.png)


![geometry](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_1383.png)

Source: [Lecture 33 — Spectral Graph Partitioning Finding a Partition (Advanced) | Stanford](https://www.youtube.com/watch?v=siCPjpUtE0A)

> **Note that the largest eigenvalue of the adjacency matrix corresponds to the smallest eigenvalue of the Laplacian.** But: Where the smallest eigenvector of the Laplacian is a constant vector, the largest eigenvector of an adjacency matrix, called the Perron vector, need not be. The Perron-Frobenius theory tells us that the largest eigenvector of an adjacency matrix is non-negative, and that its value is an upper bound on the absolute value of the smallest eigenvalue. These are equal precisely when the graph is bipartite.


> Quantum Application with Graphs: https://medium.com/quandela/exploring-graph-problems-with-single-photons-and-linear-optics-4f3d5848add8

**Graph & Graph Theory**

* In [discrete mathematics](https://en.m.wikipedia.org/wiki/Discrete_mathematics), and more specifically in [graph theory](https://en.m.wikipedia.org/wiki/Graph_theory), a graph is a structure amounting to a set of objects in which some pairs of the objects are in some sense "related".

* The objects correspond to mathematical abstractions called vertices (also called nodes or points) and each of the related pairs of vertices is called an edge (also called link or line).


**Algebraic Graph Theory**

* [Algebraic graph theory](https://en.m.wikipedia.org/wiki/Algebraic_graph_theory) is a branch of mathematics in which algebraic methods are applied to problems about graphs. This is in contrast to geometric, combinatoric, or algorithmic approaches.

* There are three main branches of algebraic graph theory, involving the use of linear algebra, the use of group theory, and the study of graph invariants.

* **Using linear algebra**:

  * The first branch of algebraic graph theory involves the study of graphs in connection with linear algebra. Especially, it studies the [spectrum (Eigendecomposition of a matrix)](https://en.m.wikipedia.org/wiki/Eigendecomposition_of_a_matrix) of the adjacency matrix, or the Laplacian matrix of a graph - this part of algebraic graph theory is also called [spectral graph theory](https://en.m.wikipedia.org/wiki/Spectral_graph_theory).

  * **Several theorems relate properties of the spectrum to other [graph properties (invariant)](https://en.m.wikipedia.org/wiki/Graph_property).** As a simple example, a connected graph with diameter D will have at least D+1 distinct values in its spectrum. Aspects of graph spectra have been used in analysing the synchronizability of networks.

* Video: [Daniel Spielman “Miracles of Algebraic Graph Theory”](https://www.youtube.com/watch?v=CDMQR422LGM)

* Video: [The Unreasonable Effectiveness of Spectral Graph Theory: A Confluence of Algorithms, Geometry and Physics](https://www.youtube.com/watch?v=8XJes6XFjxM)

*Types of Graphs*

**Complete graph**

* In graph theory, a [complete graph](https://en.m.wikipedia.org/wiki/Complete_graph) is a simple undirected graph in which every pair of distinct vertices is connected by a unique edge.

* A complete digraph is a directed graph in which every pair of distinct vertices is connected by a pair of unique edges (one in each direction).

*K7, a complete graph with 7 vertices with $n$ vertices and ${\displaystyle \textstyle {\frac {n(n-1)}{2}}}$ edges*

![gg](https://upload.wikimedia.org/wikipedia/commons/thumb/9/9e/Complete_graph_K7.svg/245px-Complete_graph_K7.svg.png)



**Complete bipartite graph**

* [Complete bipartite graph](https://en.m.wikipedia.org/wiki/Complete_bipartite_graph) (or biclique), a special [bipartite graph](https://en.m.wikipedia.org/wiki/Bipartite_graph) where every vertex on one side of the bipartition is connected to every vertex on the other side

* A complete bipartite graph with m = 5 and n = 3:

![gg](https://upload.wikimedia.org/wikipedia/commons/thumb/d/d6/Biclique_K_3_5.svg/320px-Biclique_K_3_5.svg.png)

Bipartite Graph https://www.youtube.com/watch?v=HqlUbSA9cEY



**Directed vs Undirected Graphs**

The applications for directed graphs are many and varied. They can be used to analyze electrical circuits, develop project schedules, find shortest routes, analyze social relationships, and construct models for the analysis and solution of many other problems.

https://faculty.cs.niu.edu/~freedman/241/241notes/241graph.htm


Undirected graphs are more restrictive kinds of graphs. They represent only whether or not a relationship exists between two vertices. They don't however represent a distinction between subject and object in that relationship. One type of graph can sometimes be used to approximate the other.

https://www.baeldung.com/cs/graphs-directed-vs-undirected-graph

**Hamiltonian path**

A [Hamiltonian path](https://en.m.wikipedia.org/wiki/Hamiltonian_path) (or traceable path) is a path in an undirected or directed graph that visits each vertex exactly once. A Hamiltonian cycle around a network of six vertices:

![gg](https://upload.wikimedia.org/wikipedia/commons/b/be/Hamiltonian.png)

*Graph Properties*

**Length** is the number of edges in a graph.

A **closed walk** occurs when x = y.

A **trail** is a walk with no repeated edges.

A closed trail is a **circuit**.

A **path** is a walk with no repeated vertices (A path is a sequence of distinct edges you can follow through the graph)

A closed path is a **cycle** (and circuit)

A **simple graph** is loop-free, undirected and has no multiple edges.

[INTRODUCTION to GRAPH THEORY - DISCRETE MATHEMATICS](https://www.youtube.com/watch?v=HkNdNpKUByM)

[Introduction to Graph Theory: A Computer Science Perspective](https://www.youtube.com/watch?v=LFKZLXVO-Dg)


**Wichtige Zyklen und Pfade in der Graphentheorie**

* [**Eulerian path** (Eulerkreisproblem)](https://de.m.wikipedia.org/wiki/Eulerkreisproblem) = every edge only once
  * **Euler path** (every edge once only, vertices can be multiple: exists if either 0 or 2 odd degree vertices exist with all else even degree
  * **Euler circuit** (more constraint): same as Euler path, but return to starting point (das Haus vom weihachtsmann). Use also Fleurys algorithm: works if all vertices have even degrees

* [**Hamiltonian circuit / cycle** (Hamiltonian path problem)](https://de.m.wikipedia.org/wiki/Hamiltonkreisproblem): every vertex only once
  * every vertex only once, but edges don‘t matter how often. And return to starting point.
  * **Minimum cost hamiltonian circuit = [traveling salesman problem](https://de.m.wikipedia.org/wiki/Problem_des_Handlungsreisenden).**. Is np-complete. Look for [shortest path](https://de.m.wikipedia.org/wiki/Kürzester_Pfad), siehe auch [Pathfinding](https://de.m.wikipedia.org/wiki/Pathfinding)
  * https://www.quora.com/Why-are-quantum-computers-unable-to-solve-the-travelling-salesman-problem-in-polynomial-time: *Quantum computation offers speed improvements for some very specialized problems like integer factorization, but it isn’t known or expected to offer polynomial-time algorithms for generic NP-complete problems like the Traveling Salesman Problem. If it does then NP ⊆ BQP, and most experts don’t regard this as likely at all $^{*}$*.
  * Heuristic algorithms as alternative to brute force:
    * [Dijkstra algorithmus](https://de.m.wikipedia.org/wiki/Dijkstra-Algorithmus)
    * Nearest neighbor algorithm is a heuristic, non optimal, but feasible - [greedy algorithm](https://de.m.wikipedia.org/wiki/Greedy-Algorithmus), cheapest route
    * Repeated Nearest neighbor algorithm
    * Sorted edges algorithm
    * Kruskal algorithm (spanning tree) for minimum cost spanning tree (optimal and efficient). For example energy lines

If you have a fully connected [complete graph](https://en.m.wikipedia.org/wiki/Complete_graph) and and want to compute unique Hamiltonain circuits:

![geometry](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_1389.png)

Source: [Graph theory full course for Beginners](https://www.youtube.com/watch?v=sWsXBY19o8I)

$^{*}$ *Why can’t a quantum computer solve NP complete problems in a polynomial running time (relative to their input)?
Quantum computers, like classical computers, are not currently believed to be able to solve NP-complete problems in polynomial time.

Contrary to Kurt Behnke’s answer, this is not true by definition. It’s a profound open question in mathematics.

As Vadim Yakovlevich correctly points out, it’s partly a matter of no one having found a polynomial-time quantum algorithm for NP-complete problems after decades of trying—just like no one has found a polynomial-time classical algorithm.

But it’s possible to say more than that. From reading popular articles, many people have the vague impression that a quantum computer could just try every possible solution in parallel. And if that’s all you know about them, then it’s indeed a mystery why they couldn’t solve NP-complete problems in polynomial time!

The central difficulty is that, for a computer to be useful, at some point you need to measure it to observe an answer. And if you just measured an equal superposition over all possible answers, not having done anything else, the rules of QM dictate that you’ll just see a random answer. And of course, if you’d just wanted a random answer, you could’ve picked one yourself with a lot less trouble!

So the name of the game, with every quantum algorithm, is somehow orchestrating a pattern of constructive and destructive quantum interference that boosts the probability of the correct answer (even though you yourself don’t know in advance which answer is the correct one!), and that does so efficiently. Famously, in 1994 Peter Shor showed how to do that for the problem of factoring integers, and a few related problems in number theory—but he was able to do so only by exploiting extremely special structure in those problems.

For more “generic” search problems, like NP-complete problems, the best we generally know is Grover’s algorithm: a quantum algorithm that’s able to find the right answer (i.e., concentrate a large fraction of the amplitude on that answer) in roughly the square root of the number of steps that would be needed classically. So you get some speedup, but not an exponential one.

But we also know that, if your search problem is a “black box”—i.e., you can just pick candidate solutions and evaluate if they’re correct, and you don’t know anything more about the problem’s structure—then Grover’s algorithm is the best that even a quantum computer can do. This is the content of the so-called BBBV (Bennett, Bernstein, Brassard, Vazirani) Theorem, which has several proofs, all of them crucially relying on the linearity of unitary evolution. The BBBV Theorem gives a fundamental explanation for why any fast quantum algorithm for NP-complete problems would have to look very different from anything that we know today—much like a fast classical algorithm would have to*.

**Graph Properties (Invariants)**

* A [graph property (invariant)](https://en.m.wikipedia.org/wiki/Graph_property) is a property of graphs that depends only on the abstract structure, not on graph representations such as particular labellings or drawings of the graph.

* An example graph, with the properties of

  * being [planar](https://en.m.wikipedia.org/wiki/Planar_graph) (it can be drawn on the plane in such a way that its edges intersect only at their endpoints / no edges cross each other)

  * and being [connected](https://en.m.wikipedia.org/wiki/Connectivity_(graph_theory)),

  * and with order 6,

  * size 7,

  * [diameter (distance](https://en.m.wikipedia.org/wiki/Distance_(graph_theory)) 3 (number of edges in a shortest path, also called a graph geodesic, connecting them)

  * [girth](https://en.m.wikipedia.org/wiki/Girth_(graph_theory)) 3 (girth of an undirected graph is the length of a shortest cycle contained in the graph)

  * vertex [connectivity](https://en.m.wikipedia.org/wiki/Connectivity_(graph_theory)) 1, and

  * [degree sequence](https://en.m.wikipedia.org/wiki/Degree_(graph_theory)#Degree_sequence) (3, 3, 3, 2, 2, 1):

![ggg](https://upload.wikimedia.org/wikipedia/commons/thumb/5/5b/6n-graf.svg/333px-6n-graf.svg.png)

**Graph enumeration**

* In combinatorics, an area of mathematics, [graph enumeration](https://en.m.wikipedia.org/wiki/Graph_enumeration) describes a class of combinatorial enumeration problems in which one must count undirected or directed graphs of certain types, typically as a function of the number of vertices of the graph.

* These problems may be solved either exactly (as an [algebraic enumeration](https://en.m.wikipedia.org/wiki/Algebraic_enumeration) problem) or asymptotically.

* The number of labeled n-vertex simple undirected graphs is $2^{\frac{n(n −1)}{2}}$

* The number of labeled n-vertex simple directed graphs is $2^{n(n −1)}$

* **Number of connected labeled graphs with n nodes**: The number of distinct connected labeled graphs with n nodes is tabulated in the [On-Line Encyclopedia of Integer Sequences](https://en.m.wikipedia.org/wiki/On-Line_Encyclopedia_of_Integer_Sequences) as sequence [A001187](https://oeis.org/A001187). The first few non-trivial terms are

$\begin{array}{|l|l|}
\hline {\text { n }} & {\text { graphs }} \\
\hline 1 & 1 \\
\hline 2 & 1 \\
\hline 3 & 4 \\
\hline 4 & 38 \\
\hline 5 & 728 \\
\hline6 & 26704 \\
\hline 7 & 1866256 \\
\hline 8 & 251548592 \\
\hline
\end{array}$






**Genus of a Graph**

[genus](https://en.m.wikipedia.org/wiki/Genus_(mathematics)) (plural genera) has a few different, but closely related, meanings. Intuitively, the genus is the number of "holes" of a surface.[1] A sphere has genus 0, while a torus has genus 1.

A genus-2 surface:

![ggg](https://upload.wikimedia.org/wikipedia/commons/thumb/b/bc/Double_torus_illustration.png/219px-Double_torus_illustration.png)

**The genus of a graph is the minimal integer n such that the graph can be drawn without crossing itself on a sphere with n handles** (i.e. an oriented surface of the genus n). Thus, a planar graph has genus 0, because it can be drawn on a sphere without self-crossing.



**Graph embedding**

In topological graph theory, an [embedding](https://en.m.wikipedia.org/wiki/Graph_embedding) (also spelled imbedding) of a graph G on a surface $\Sigma$ is a representation of G on $\Sigma$ in which points of $\Sigma$ are associated with vertices and simple arcs (homeomorphic images of $[0,1])$ are associated with edges in such a way that:

the endpoints of the arc associated with an edge $e$ are the points associated with the end vertices of e, no arcs include points associated with other vertices, two arcs never intersect at a point which is interior to either of the arcs.

> Often, an embedding is regarded as an equivalence class (under homeomorphisms of $\Sigma$) of representations of the kind just described.

**Spanning tree**

A [spanning tree](https://en.m.wikipedia.org/wiki/Spanning_tree) T of an undirected graph G is a subgraph that is a tree which includes all of the vertices of G.

![gg](https://upload.wikimedia.org/wikipedia/commons/thumb/d/d4/4x4_grid_spanning_tree.svg/252px-4x4_grid_spanning_tree.svg.png)

A special kind of spanning tree, the Xuong tree, is used in topological graph theory to find graph embeddings with maximum [genus](https://en.m.wikipedia.org/wiki/Genus_(mathematics)) (is the number of "holes" of a surface)

**Cover times of random walks on graphs**

Cover times: A cover time is the number of steps needed for a random walk to visit all vertices in a given graph.

http://uu.diva-portal.org/smash/get/diva2:319078/FULLTEXT01.pdf

**Cover times of graphs and the Gaussian free field**

Max point of a gaussian free field is the max number of time steps to visit each vertex in a graph

https://tcsmath.wordpress.com/2010/12/09/open-question-cover-times-and-the-gaussian-free-field/

https://en.m.wikipedia.org/wiki/Gaussian_free_field

![ggg](https://upload.wikimedia.org/wikipedia/commons/9/93/Discrete_Gaussian_free_field_on_60_x_60_square_grid.png)

**Circuit rank**

* [Circuit rank](https://en.m.wikipedia.org/wiki/Circuit_rank), cyclomatic number, cycle rank, or nullity of an undirected graph is the **minimum number of edges that must be removed from the graph to break all its cycles, making it into a tree or forest**.

* It is equal to the number of independent cycles in the graph (the size of a cycle basis).

* This graph has circuit rank r = 2 because it can be made into a tree by removing two edges, for instance the edges 1–2 and 2–3, but removing any one edge leaves a cycle in the graph:

![gg](https://upload.wikimedia.org/wikipedia/commons/thumb/5/5b/6n-graf.svg/320px-6n-graf.svg.png)




**Degree**

* Degree of a matrix = how many neighbours in each vertex?

* [Degree](https://en.m.wikipedia.org/wiki/Degree_(graph_theory)#Degree_sequence) (or valency) of a vertex of a graph is the number of edges that are incident to the vertex; in a multigraph, a loop contributes 2 to a vertex's degree, for the two ends of the edge.

* The maximum degree of a graph G, denoted by $\Delta (G)$, and the minimum degree of a graph, denoted by ${\displaystyle \delta (G)}$, are the maximum and minimum of its vertices' degrees. In the multigraph shown on the right, the maximum degree is 5 and the minimum degree is 0.

* In a [regular graph](https://en.m.wikipedia.org/wiki/Regular_graph) (=each vertex has the same number of neighbors), every vertex has the same degree, and so we can speak of the degree of the graph.

* A [complete graph](https://en.m.wikipedia.org/wiki/Complete_graph) (denoted $K_{n}$, where n is the number of vertices in the graph) is a special kind of regular graph where all vertices have the maximum possible degree, $n-1$.

* Degree sum formula states that, given a graph $G=(V,E)$: ${\displaystyle \sum _{v\in V}\deg(v)=2|E|\,}$ (Handshaking lemma)

*Two non-isomorphic graphs with the same degree sequence (3, 2, 2, 2, 2, 1, 1, 1):*

![gg](https://upload.wikimedia.org/wikipedia/commons/thumb/5/5d/Conjugate-dessins.svg/240px-Conjugate-dessins.svg.png)

**(Connected) Component (isolated subgraph) = Betti number $B_0$**

* The number of (connected) [components](https://en.m.wikipedia.org/wiki/Component_(graph_theory)) of a topological space is an important topological invariant, the zeroth Betti number $B_0$, and the number of components of a graph is an important graph invariant, and **in topological graph theory it can be interpreted as the zeroth Betti number of the graph**.

* The number of components arises in other ways in graph theory as well: **In algebraic graph theory number of components equals the multiplicity of 0 as an eigenvalue of the Laplacian matrix of a finite graph**.

* bzw: The largest eigenvalue of a k-regular graph is k (Frobenius' theorem), **its multiplicity is the number of connected components of the graph**. (Der größte Eigenwert eines k-regulären Graphen ist k (Satz von Frobenius), seine Vielfachheit ist die Anzahl der Zusammenhangskomponenten des Graphen.) Taken from: https://de.m.wikipedia.org/wiki/Spektrum_(Graphentheorie)

* Let G be a d-regular graph. The algebraic multiplicity of eigenvalue 0 for the Laplacian matrix is exactly 1 iff G is connected. https://math.uchicago.edu/~may/REU2013/REUPapers/Marsden.pdf

* https://www.sciencedirect.com/science/article/pii/S0024379518300156

**Connected Components**

https://en.m.wikipedia.org/wiki/Component_(graph_theory)

https://en.m.wikipedia.org/wiki/Connected_space
* How the graph can be used to capture connected component
* Any graph gives us a topological space (like joining a square as a connected component)
* Graph can capture connected components via putting an equivalence relation on the set of vertices.
* **equivalence relation is generated by tail of an edge (vertex 1) is equivalent to a head of an edge (vertex 2) / between the vertices of an edge (which together forms a connected component)**.
* To generate equivalence relation we have to make sure that we enforce reflexivity, symmetry and transivity.
* You get equivalence classes

**Rank of a graph**

* the [rank of an undirected graph](https://en.m.wikipedia.org/wiki/Rank_(graph_theory)) has two unrelated definitions. Let n equal the number of vertices of the graph.

* In the [matrix theory](https://en.m.wikipedia.org/wiki/Matrix_(mathematics)) of graphs the rank r of an undirected graph is defined as the rank of its adjacency matrix. Analogously, the [nullity](https://en.m.wikipedia.org/wiki/Nullity_(graph_theory)) of the graph is the nullity of its adjacency matrix, which equals n − r.

* In the [matroid theory](https://en.m.wikipedia.org/wiki/Matroid) of graphs the rank of an undirected graph is defined as the number n − c, where c is the number of connected components of the graph.

  * Equivalently, the rank of a graph is the rank of the oriented [incidence matrix](https://en.m.wikipedia.org/wiki/Incidence_matrix) associated with the graph.

  * Analogously, the [nullity of the graph](https://en.m.wikipedia.org/wiki/Nullity_(graph_theory)) is the [nullity (Kernel)](https://en.m.wikipedia.org/wiki/Kernel_(linear_algebra)) of its oriented incidence matrix, given by the formula m − n + c, where n and c are as above and m is the number of edges in the graph.

  * The nullity is equal to the first [Betti number](https://en.m.wikipedia.org/wiki/Betti_number) of the graph. The sum of the rank and the nullity is the number of edges.


**Connectivity** (Menger's theorem, max-flow min-cut)

* In mathematics and computer science, [connectivity](https://en.m.wikipedia.org/wiki/Connectivity_(graph_theory)) is one of the basic concepts of graph theory: **it asks for the minimum number of elements (nodes or edges) that need to be removed to separate the remaining nodes into two or more [isolated subgraphs (= Connected component)](https://en.m.wikipedia.org/wiki/Component_(graph_theory)).**

* It is closely related to the theory of network flow problems. The connectivity of a graph is an important measure of its resilience as a network.

* One of the most important facts about connectivity in graphs is [Menger's theorem](https://en.m.wikipedia.org/wiki/Menger%27s_theorem): for any two vertices u and v in a connected graph G, the numbers κ(u, v) and λ(u, v) can be determined efficiently using the [max-flow min-cut theorem](https://en.m.wikipedia.org/wiki/Max-flow_min-cut_theorem) algorithm.

* The vertex- and edge-connectivities of a disconnected graph are both 0.

* 1-connectedness is equivalent to connectedness for graphs of at least 2 vertices.

* The [complete graph](https://en.m.wikipedia.org/wiki/Complete_graph) on n vertices has edge-connectivity equal to n − 1. Every other simple graph on n vertices has strictly smaller edge-connectivity.


*This graph becomes disconnected when the right-most node in the gray area on the left is removed:*

![gg](https://upload.wikimedia.org/wikipedia/commons/thumb/f/f4/Network_Community_Structure.svg/389px-Network_Community_Structure.svg.png)





**Clique**

* Eine [Clique](https://de.m.wikipedia.org/wiki/Clique_(Graphentheorie)) bezeichnet in der Graphentheorie **eine Teilmenge von Knoten in einem ungerichteten Graphen, bei der jedes Knotenpaar durch eine Kante verbunden ist**.

* Zu entscheiden, ob ein Graph eine Clique einer bestimmten Mindestgröße enthält, wird [Cliquenproblem](https://de.m.wikipedia.org/wiki/Cliquenproblem) genannt und gilt, wie das Finden von größten Cliquen, als algorithmisch schwierig (NP-vollständig).

* Das Finden einer Clique einer bestimmten Größe in einem Graphen ist ein [NP-vollständiges Problem](https://de.m.wikipedia.org/wiki/NP-Vollständigkeit) (= schwierigsten Problemen in der Klasse NP gehört, also sowohl in NP liegt als auch NP-schwer ist) und somit auch in der Informationstechnik ein relevantes Forschungs- und Anwendungsgebiet.

*Ein Graph mit einer Clique der Größe 3:*

![ggg](https://upload.wikimedia.org/wikipedia/commons/thumb/8/86/6n-graf-clique.svg/350px-6n-graf-clique.svg.png)


**Clique complex (Whitney complexes)**

[Clique complexes](https://en.m.wikipedia.org/wiki/Clique_complex), independence complexes, flag complexes, Whitney complexes and conformal hypergraphs are closely related mathematical objects in graph theory and geometric topology that each describe the cliques (complete subgraphs) of an undirected graph.

Siehe auch: [Topological Graph Theory](https://en.m.wikipedia.org/wiki/Topological_graph_theory)

**CW Complex**

A [CW complex](https://en.m.wikipedia.org/wiki/CW_complex) (also called cellular complex or cell complex) is a kind of a topological space that is particularly important in algebraic topology.[1] It was introduced by J. H. C. Whitehead[2] to meet the needs of homotopy theory. This class of spaces is broader and has some better categorical properties than simplicial complexes, but still retains a combinatorial nature that allows for computation (often with a much smaller complex). The C stands for "closure-finite", and the W for "weak" topology

Siehe auch: [Graph (topology)](https://en.m.wikipedia.org/wiki/Graph_(topology))

**Abstract simplicial complex**

In combinatorics, an [abstract simplicial complex](https://en.m.wikipedia.org/wiki/Abstract_simplicial_complex) (ASC), often called an abstract complex or just a complex, is a family of sets that is closed under taking subsets, i.e., every subset of a set in the family is also in the family. It is a purely combinatorial description of the geometric notion of a simplicial complex.[1]

For example, in a 2-dimensional simplicial complex, the sets in the family are the triangles (sets of size 3), their edges (sets of size 2), and their vertices (sets of size 1).

Geometric realization of a 3-dimensional abstract simplicial complex:

![ggg](https://upload.wikimedia.org/wikipedia/commons/thumb/5/50/Simplicial_complex_example.svg/247px-Simplicial_complex_example.svg.png)

Examples:

* Let G be an undirected graph. The [clique complex](https://en.m.wikipedia.org/wiki/Clique_complex) (flag complexes, Whitney complexes) of G is an ASC whose faces are all cliques (complete subgraphs) of G. The independence complex of G is an ASC whose faces are all independent sets of G (it is the clique complex of the complement graph of G). Clique complexes are the prototypical example of flag complexes. A flag complex is a complex K with the property that every set of elements that pairwise belong to faces of K is itself a face of K.

* Let M be a metric space and δ a real number. The [Vietoris–Rips complex](https://en.m.wikipedia.org/wiki/Vietoris–Rips_complex) is an ASC whose faces are the finite subsets of M with diameter at most δ. It has applications in homology theory, hyperbolic groups, image processing, and mobile ad hoc networking. It is another example of a flag complex (clique complex).



**Simplices**

A [simplex](https://de.m.wikipedia.org/wiki/Simplex_(Mathematik)) consists of 3 components: vertices, edges and faces

* 0-simplex: point
* 1-simplex: edge (line)
* 2-simplex: 3 connected points
* 3-simplex: solid (3 dimensions)

k-simplex is k-dimensional is formed using (k+1) vertices ('convex hull')

* Complete graphs (every vertice is connected to all others) can be interpreted as simplices

* you can interpret any graph as simplicial complex

**Simplicial Complex**

https://en.m.wikipedia.org/wiki/Abstract_simplicial_complex

* Mit einem Simplizialkomplex können die entscheidenden Eigenschaften von triangulierbar topologischen Räumen algebraisch charakterisiert werden können

* Warum? Definition von Invarianten im topologischen Raum. Simplicial complexes can be seen as higher dimensional generalizations of neighboring graphs.

* Wie? Untersuchung eines topologischen Raums durch Zusammenfügen von Simplizes womit eine Menge im d-dimensionalen euklidischen Raum konstruiert wird, die [homöomorph](https://de.m.wikipedia.org/wiki/Homöomorphismus) ist zum gegebenen topologischen Raum.

* Die „Anleitung zum Zusammenbau“ der Simplizes, das heißt die Angaben darüber, wie die Simplizes zusammengefügt sind, wird dann in Form einer Sequenz von [Gruppenhomomorphismen](https://de.m.wikipedia.org/wiki/Gruppenhomomorphismus) rein algebraisch charakterisiert.

* Cells can have various dimensions: vertices, edges, triangles, tetrahedra and their higher dimensional analogues. complexes reflect the correct topology of the data

* If we glue many simplices together in such a way that the intersection is also a simplex (along an edge for example), we obtain a simplicial complex. If we see three points connected by edges that form a triangle, we fill in the triangle with a 2-dimensional face. Any four points that are all pairwise connected get filled in with a 3-simplex etc. The resulting simplicial complex is called (Vietoris) Rips complex.

* The persistence diagram is not computed directly from L􏰑ε. Instead, one forms an object called a Cech complex. Simplicial and cubical complexes are examples of cell complexes. Cech complex is an example of a simplicial complex. - in practice, often used Vietoris-Rips complex Vε - the persistent homology defined by Vε approximates the persistent homology defined by Cε.


![vvv](https://raw.githubusercontent.com/deltorobarba/repo/master/homology_03.jpg)


**Triangulation**

* In der Topologie ist eine [Triangulierung](https://en.m.wikipedia.org/wiki/Triangulation_(topology)) oder Triangulation eine **Zerlegung eines Raumes in Simplizes** (Dreiecke, Tetraeder oder deren höher-dimensionale Verallgemeinerungen).

* Mannigfaltigkeiten bis zur dritten Dimension sind stets triangulierbar.

* Ursprüngliche Motivation für die Hauptvermutung war der Beweis der topologischen Invarianz kombinatorisch definierter Invarianten wie der [simplizialen Homologie](https://de.m.wikipedia.org/wiki/Simpliziale_Homologie). Trotz des Scheiterns der Hauptvermutung lassen sich Fragen dieser Art oftmals mit dem [simplizialen Approximationssatz](https://de.m.wikipedia.org/wiki/Simpliziale_Approximation) beantworten.

* Triangulation ist eine Zerlegung eines Raumes in Simplizes (Dreiecke, Tetraeder oder höher-dimensionale Verallgemeinerungen) to tease out properties of manifolds

* **A triangulation of a topological space X is a simplicial complex K, homeomorphic to X, together with a homeomorphism h: K → X**

* triangulation offers a concrete way of visualizing spaces that are difficult to see, and helps computing an invariant.

* Ist gegeben durch einen (abstrakten) Simplizialkomplex K und Homöomorphismus h : | K | → X der geometrischen Realisierung | K | auf X.
https://en.wikipedia.org/wiki/Triangulation_(topology)

* a two-dimensional sphere (surface of a solid ball) can be approximated by gluing together two-dimensional triangles, and a three-dimensional sphere can be approximated by gluing together three-dimensional tetrahedra.

* Triangles and tetrahedra are examples of more general shapes called simplices, which can be defined in any dimension.




**[Filtration](https://de.m.wikipedia.org/wiki/Filtrierung_(Mathematik)) & Inclusion Maps**

https://en.m.wikipedia.org/wiki/Filtered_algebra

https://en.m.wikipedia.org/wiki/Filtration_(mathematics)

* **With increasing size of d, we are dealing with a sequence of simplicial complexes, each a sub-complex of the next. That is, a simplicial complex constructed from data for some small distance is a subset of the simplicial complex constructed for a larger distance**.

* Equally there is an **[inclusion map](https://en.m.wikipedia.org/wiki/Inclusion_map) from each simplicial complex to the next**.

* **This sequence of simplicial complexes, with inclusion maps, is called a filtration**.

* When we apply homology to a filtration, we obtain an algebraic structure called persistent modul.

* So if we want to compute **ith homology with coefficients from a field k**.

* The **homology of any complex Cj is a vector space**, and the **inclusion maps between complexes induce linear maps between homology vector spaces**.

* The direct sum of the homology vector spaces is an algebraic module - in fact a **graded module over the polynomial ring** k[x]. The variable x acts as a **shift map**, taking each homology generator to its image in the next vector space.

* Furthermore, a structure theorem tells us that **a persistent module decomposes nicely into a direct sum of simple modules**, each corresponding to a bar in the barcode. This means: a barcode reall is an algebraic structure.

![ccc](https://raw.githubusercontent.com/deltorobarba/repo/master/homology_01.jpg)

![ccc](https://raw.githubusercontent.com/deltorobarba/repo/master/homology_01.jpg)

![cscsvs](https://raw.githubusercontent.com/deltorobarba/repo/master/homology_02.jpg)

**Permutationen und Kombinatorik**

* https://en.m.wikipedia.org/wiki/Combinatorics
* https://de.m.wikipedia.org/wiki/Permutation
* https://de.m.wikipedia.org/wiki/Galoistheorie

Permutationen und Kombinatorik sind beide Begriffe, die in der Wahrscheinlichkeitstheorie und Statistik verwendet werden. Sie befassen sich mit der Anordnung von Objekten und der Anzahl der Möglichkeiten, wie Dinge arrangiert oder ausgewählt werden können. Hier sind die Grunddefinitionen:

1. **Permutationen**: Bei einer Permutation geht es um die Anordnung von Objekten, wobei die Reihenfolge wichtig ist. Eine Permutation ist also eine Anordnung von Objekten in einer bestimmten Reihenfolge. Zum Beispiel, wenn wir die Zahlen 1, 2 und 3 haben, gibt es 6 mögliche Permutationen: 123, 132, 213, 231, 312 und 321.

2. **Kombinatorik**: Die Kombinatorik ist ein breiterer Begriff, der die Studie der Möglichkeiten der Anordnung und Auswahl von Objekten beinhaltet. Sie umfasst sowohl Permutationen als auch Kombinationen. Kombinationen sind eine Unterkategorie der Kombinatorik, bei der es um die Auswahl von Objekten geht, wobei die Reihenfolge nicht wichtig ist. Wenn wir zum Beispiel aus den Zahlen 1, 2 und 3 zwei Zahlen auswählen, gibt es 3 mögliche Kombinationen: {1,2}, {1,3}, {2,3}.

Somit ist die Kombinatorik ein umfassenderes Konzept, das Permutationen, Kombinationen und andere ähnliche Konzepte umfasst. Sie ist das Studium der Methoden zur Zählung, Anordnung und Auswahl von Objekten. Permutationen sind ein spezifischer Teil dieses Studiums, wo die Reihenfolge der ausgewählten Objekte von Bedeutung ist.

###### *Topology*

**Basis: Homomorphiesatz (Fundamental Theorem on Homomorphism)**

> Homomorphiesatz (Isomorphie on quotient group)

Video: [First Isomorphism Theorem for Groups](https://www.youtube.com/watch?v=JiS43Twomsk&list=WL&index=10)

Falls quotient group Ker/Img = 0, dann handelt es sich um eine Exact Sequence (simplicial homology). Ist das die normal subgroup (normal subgroup)??

Der Kern von f ist stets ein Normalteiler von G und das Bild von f ist eine Untergruppe von H. Nach dem Homomorphiesatz ist die Faktorgruppe G/Kern(f) isomorph zu Bild (f).

Video: [Persistent Homology](https://youtu.be/ktKCzMmDXDk)

Video: [Chapter 6: Homomorphism and (first) isomorphism theorem | Essence of Group Theory](https://youtu.be/2kmIHyD8zTk)

Kern is a normal subgroup (normalteiler) von H, weil Rest is 0. alles andere sind cosets (mit jeweils rest 1,2,3, etc nachdem was der normalteiler ist)

......................

* [Homomorphiesatz](https://de.m.wikipedia.org/wiki/Homomorphiesatz)

* https://youtu.be/QA9rrDMlaHc (Homomorphiesatz mit Hasen und Jaegern)

* https://youtu.be/390eRzVSC2k (Homomorphie mit Modulo und kommutativen Diagramm)

**Homomorphiesatz (allgemein)**:

* Aus einer Abbildung $f$ zwischen zwei Gruppen $G$ und $H$, die weder injektiv noch surjektiv ist, wollen wir eine bijektive Abbildung herleiten.

* Schritt 1: Das macht man, indem man zuerst auf der rechten Seite alle Elemente ausschliesst, die nicht Teil der Zielmenge sind, und sich nur auf die 'getroffenen' Elemente fokussiert (The image of $f$ is hierbei a subgroup of $H$.)

* Schritt 2: Jetzt hat man nur noch das Problem auf der linken Seite, dass mehrere Startelemente in $G$ auf ein und dasselbe Zielelement in $H$ verweisen. Man betrachtet die Startelemente dann einfach als identisch. Das macht man in dem man $G$ umwandelt in $G$ / Kern ($f$) (man bildet die Faktorgruppe, und spricht aus: modulo Kern von f. Der Quotientenvektorraum von $G$ nach kern von $f$.
  * Das geht, weil $f$ ein Homomorphismus ist
  * Das bedeutet, wenn zwei Elemente a und b im Start auf ein Element im Ziel verweisen, dann unterscheiden sie sich um ein Kernelement. Heisst, a minus b ist ein Element, das auf 0 geschickt wird, und damit ein Kernelement.
  * Und wenn ich jetzt modulo des kerns rechne, dann tue ich so, als ob es die Differenz nicht gibt. Weil es bedeutet a =b. Der Quotientenraum ist ein kunstlich geschaffener Raum, wo a und b identisch gemacht wurden (kongruent).
  * Remember aus Modulorechnung: Zwei Zahlen sind kongruent (modulo des Moduls m), wenn ihre Differenz durch m teilbar ist. Hier ist das Modulo der Kern. Also Rest muss 0 sein.
  * Modulo n (Reste berechnen, hierbei 0): die Differenz zweier Elemente ist teilbar durch n. Die beiden Elemente sind dann kongruent (identisch).
  * $G$ modulo Kern $f$ ist isomorph (identisch =bijektiv und homomorph)) zu Bild $f$, das in $H$ liegt.

* **Der Kern von $f$ ist stets ein Normalteiler von $G$ und das Bild von $f$ ist eine Untergruppe von $H$. Nach dem Homomorphiesatz ist die Faktorgruppe $G / \operatorname{Kern}(f)$ [isomorph (bijektiv)](https://de.m.wikipedia.org/wiki/Isomorphismus) zu Bild $(f)$.**

**Bedingungen:**

* Let $G$ and $H$ be two groups.
* and let $f$ : $G \rightarrow H$ be a [group homomorphism](https://de.wikipedia.org/wiki/Gruppenhomomorphismus).
* and let $K$ be a normal subgroup (Normalteiler) in $G$ and $\varphi$ the natural surjective homomorphism $G \rightarrow G / K$ (where $G / K$ is a quotient group). Diese Faktorgruppen sind homomorphe Bilder von G und **jedes homomorphe Bild von G ist zu einer solchen Faktorgruppe G/K isomorph**.

![cc](https://raw.githubusercontent.com/deltorobarba/repo/master/homomorphy.jpg)

**Then:**

1. **Dann ist der Kern von $f$ ein Normalteiler von $G$.**
  * Normalteiler sind die [Kerne](https://de.m.wikipedia.org/wiki/Kern_(Algebra)) von Gruppenhomomorphismen, weshalb dann klar ist, dass umgekehrt der Kern von $f$ ein Normalteiler von $G$ ist.

  * If $K$ is a **subset** of ker $(f)$ then there exists a unique homomorphism $h: G / K \rightarrow H$ such that $f=h$ $\varphi$. In other words, the natural projection $\varphi$ is universal among homomorphisms on $G$ that map $K$ to the identity element.

2. **und daher kann die Faktorgruppe $G /$ ker $f$ gebildet werden.**

3. **Nach dem [Homomorphiesatz](https://de.wikipedia.org/wiki/Homomorphiesatz) ist diese Faktorgruppe $G /$ ker $f$ isomorph zum Bild von $f$, das eine Untergruppe von $H$ ist.**
  * The image of $f$ is isomorphic to the quotient group $G /$ ker ($f$). And in particular, if $f$ is surjective then $H$ is isomorphic to $G$ / ker $(f)$. [Source](https://en.m.wikipedia.org/wiki/Isomorphism_theorems#First_Isomorphism_Theorem_4)
  * The image of $f$ is hierbei a subgroup of $H$.

![cc](https://upload.wikimedia.org/wikipedia/commons/thumb/7/7d/Group_homomorphism_ver.2.svg/500px-Group_homomorphism_ver.2.svg.png)

*Image of a group homomorphism (h) from G (left) to H (right).*

*The smaller oval inside H is the image of h. N is the kernel of h and aN is a coset of N.* [Source](https://en.m.wikipedia.org/wiki/Group_homomorphism)

See also: https://mathepedia.de/Kern_und_Bild_Homomorphismus.html

*Diagram of the fundamental theorem on homomorphisms where f is a homomorphism, N is a normal subgroup of G and e is the identity element of G.*

![Image](https://upload.wikimedia.org/wikipedia/commons/thumb/2/25/Diagram_of_the_fundamental_theorem_on_homomorphisms.svg/440px-Diagram_of_the_fundamental_theorem_on_homomorphisms.svg.png)

https://en.m.wikipedia.org/wiki/Homological_algebra

https://www.cantorsparadise.com/an-intro-to-topology-9e0478313b63

https://quantum-journal.org/views/qv-2023-01-26-70/

Simplicial Homology (TDA)

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_1416.png)


![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_1417.png)


![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_1418.png)


![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_1419.png)


![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_1420.png)


![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_1421.png)


![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_1422.png)


![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_1423.png)


Free abelian groups: https://youtu.be/xPeeFp_Hd3A

simplicial complexes: https://youtu.be/Uq4dTjHfLpI

computing homology groups: https://youtu.be/YNBi4Ix3cY0

computing more homology groups: https://youtu.be/l7QWg0UzBRA

betti numbers: https://youtu.be/NgrIPPqYKjQ

persistent homology: https://youtu.be/ktKCzMmDXDk

first isomorphism theorem: https://youtu.be/2kmIHyD8zTk

isomorphic graphs: https://youtu.be/EwV4Puk2coU

cohomoly and forms: https://youtu.be/2ptFnIj71SM

differential forms: https://youtu.be/CYz_s82JnY8

covectors and one-forms: https://youtu.be/ziD8ewQjaf4

**Algebraic Homology explained via Graph and Group Theory**

* Find the nullspace of the matrix, then we can find all the cycles (min 29)

* The number of vectors in the spanning set of the null space of the reduced matrix representation would be the number of generating cycles

* A [free abelian group](https://en.m.wikipedia.org/wiki/Free_abelian_group) is an abelian group (commutative) with a basis.

Source: [An introduction to homology | Algebraic Topology | NJ Wildberger](https://www.youtube.com/watch?v=ShWdSNJeuOg)

![geometry](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_1391.png)

![geometry](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_1392.png)

![geometry](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_1394.png)

Now adding a higher dimensional disc:

![geometry](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_1395.png)

![geometry](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_1396.png)

Continue: https://www.youtube.com/watch?v=2wn10l9qbJI&list=PL6763F57A61FE6FE8&index=36&t=1362s

Playlist: https://www.youtube.com/playlist?list=PL6763F57A61FE6FE8

**Simplicial Homology (Application of Quotient Groups)**

> <font color="blue">**Falls quotient group Ker/Img = 0, dann handelt es sich um eine Exact Sequence**

[Simplicial Homology: On Characterizing the Capacity of Neural Networks using Algebraic Topology](https://m0nads.wordpress.com/tag/persistent-homology/)

*Since Bn is a subgroup of Zn, we may form the quotient group Hn = Zn/Bn -> so Modulo (=Restwerte) ist dann die Dimension der Löcher (=invarianten)*

![bb](https://raw.githubusercontent.com/deltorobarba/repo/master/simplicial_homology_07.png)

![bb](https://raw.githubusercontent.com/deltorobarba/repo/master/simplicial_homology_10.png)

![bb](https://raw.githubusercontent.com/deltorobarba/repo/master/simplicial_homology_08.png)

![bb](https://raw.githubusercontent.com/deltorobarba/repo/master/simplicial_homology_09.png)

![bb](https://raw.githubusercontent.com/deltorobarba/repo/master/simplicial_homology_06.png)

![bb](https://raw.githubusercontent.com/deltorobarba/repo/master/simplicial_homology_05.png)

https://www.intmath.com/differential-equations/3-integrable-combinations.php

Simple examples of homology:

https://en.m.wikipedia.org/wiki/Graph_homology

https://en.m.wikipedia.org/wiki/Simplicial_homology

In biology, [homology](https://en.m.wikipedia.org/wiki/Homology_(biology)) is similarity due to shared ancestry between a pair of structures or genes in different taxa. A common example of homologous structures is the forelimbs of vertebrates, where the wings of bats and birds, the arms of primates, the front flippers of whales and the forelegs of four-legged vertebrates like dogs and crocodiles **are all derived from the same ancestral tetrapod structure**.

https://de.wiktionary.org/wiki/homolog

Biologie: **von einer gemeinsamen Urform ableitbar**.

Worttrennung: ho·mo·log, keine Steigerung

Beispiel: Der Flügelbug des Vogels ist unserem Handgelenk homolog.

https://de.wiktionary.org/wiki/Homologie

Biologie: Übereinstimmung von biologischen Strukturen (Organe, Makromoleküle et cetera) oder Verhaltensanteilen hinsichtlich ihrer Entwicklungsgeschichte und nicht in Hinsicht auf ihre Funktion

**Homology**

* [Homology](https://en.m.wikipedia.org/wiki/Homology_(mathematics)) is a general way of **associating a sequence of algebraic objects, such as abelian groups (or modules), to other mathematical objects such as topological spaces.**

* **Homology Groups**: Homology groups were originally defined in algebraic topology. Similar constructions are available in a wide variety of other contexts, such as abstract algebra, groups, Lie algebras, Galois theory, and algebraic geometry. See [Construction of homology groups](https://en.m.wikipedia.org/wiki/Homology_(mathematics)#Construction_of_homology_groups)

* In homology we want to Linearize the equivalence relation! Homology and cohomology are linear theories (easier to compute, and methods of linear algebra applicable) (though in the process you loose a bit information


* Homology counts components, holds, voids etc.

* In topology we define something called homology for simplicial complexes.

* **Homology of a simplicial complex is computable via linear algebra**.

* Die Homologie ist ein mathematischer Ansatz, die Existenz von Löchern zu formalisieren.

* Gewisse „sehr feine“ Löcher sind für die Homologie unsichtbar; hier kann u. U. auf die schwerer zu bestimmenden Homotopiegruppen zurückgegriffen werden.

1. define set of vertices V0 (=vector space V0 with basis given by the set of vertices) and set of edges V1

2. linearize equivalence relation: instead of looking at „tail of edge is equivalent to head of edge“, we consider the difference between these two

  * **e $\mapsto$ h(e) - t(e)** is the linear combination of two vertices, and hence an element of e zero! e wird abgebildet auf dem Element (h(e) - t(e))
  * **h(e) $\equiv$ t(e) mod (im d)** das heisst: h(e) ist identisch zu t(e) modulo dem Bild von d

* The linear map d : V1 -> V0 sends an edge e to (h(e) - t(e))
* e is an edge which is a basis element inside V1
* So the image of d (difference) is just the supspace of V0 generated by these elements
* Generate [equivalence relation](https://en.m.wikipedia.org/wiki/Equivalence_relation) (Homomorphiesatz?): impose reflexivity, symmetry and transivity.
* So: to get the image of d you look at the subspace generated by e |—> h(e) - t(e)
    * Closure under addition more or less corresponds to transitivity,
    * closure under negation corresponds to symmetry, and
    * closure under zero corresponds to reflexivity.
    * So this precisely linearizes the equivalence relation from before.
* Now we have one linear map. But we need two linear maps which composes 0 to get homology or cohomology. There are two ways you can get a composite to get 0 very easily.
    * Either start from zero and map to V1 and then go to V0, or
    * start at V1 and use d to get to V0 and then go to 0 with the zero map there.
* Modular the image of this map which is zero

* https://ncatlab.org/nlab/show/homology

**Homological Algebra**

* [Homological algebra](https://en.m.wikipedia.org/wiki/Homological_algebra) is the study of homological functors and the intricate algebraic structures that they entail; its development was closely intertwined with the emergence of category theory. A central concept is that of chain complexes, which can be studied through both their homology and cohomology.

*A diagram used in the snake lemma, a basic result in homological algebra:*

![gg](https://upload.wikimedia.org/wikipedia/commons/thumb/8/8d/Snake_lemma_origin.svg/375px-Snake_lemma_origin.svg.png)


* Why using Homological Algebra: **translate a problem of interest into a sequence of „higher“ (=homological algebra) linear algebra problems**

* homological algebra is the **study of homological functors** and the intricate algebraic structures that they entail.

* One quite useful and ubiquitous concept in mathematics is that of **chain complexes**, which can be studied through both their homology and cohomology.

* Homological algebra affords the means to **extract information contained in these complexes and present it in the form of homological invariants of rings, modules, topological spaces**, and other 'tangible' mathematical objects. A powerful tool for doing this is provided by spectral sequences.

**Simplicial Homology**

* Die [Simpliziale Homologie](https://de.m.wikipedia.org/wiki/Simpliziale_Homologie) ist in der Algebraischen Topologie, einem Teilgebiet der Mathematik, eine Methode, die einem beliebigen Simplizialkomplex **eine Folge [abelscher Gruppen](https://de.m.wikipedia.org/wiki/Abelsche_Gruppe) zuordnet**.

* Anschaulich gesprochen zählt sie die Löcher unterschiedlicher Dimension des zugrunde liegenden Raumes.

**Graph homology**

In algebraic topology and graph theory, [graph homology](https://en.m.wikipedia.org/wiki/Graph_homology) describes the homology groups of a graph, where the graph is considered as a topological space. It formalizes the idea of the number of "holes" in the graph. It is a special case of a simplicial homology, as a graph is a special case of a simplicial complex. Since a finite graph is a 1-complex (i.e., its 'faces' are the vertices - which are 0-dimensional, and the edges - which are 1-dimensional), the only non-trivial homology groups are the 0-th group and the 1-th group.

The 1st homology group

Similarly, $\partial _{1}$ maps any cycle in C1 to the zero element of C0. In other words, the set of cycles in C1 **generates the null space (the kernel)** of $\partial _{1}$.

https://www.youtube.com/watch?v=ShWdSNJeuOg&t=240s

**Topological Graph Theory**

In mathematics, [topological graph theory](https://en.m.wikipedia.org/wiki/Topological_graph_theory) is a branch of graph theory. It studies the embedding of graphs in surfaces, spatial embeddings of graphs, and graphs as topological spaces.[1] It also studies immersions of graphs.

To an undirected graph we may associate an abstract simplicial complex C with a single-element set per vertex and a two-element set per edge.[2] The geometric realization |C| of the complex consists of a copy of the unit interval [0,1] per edge, with the endpoints of these intervals glued together at vertices. In this view, embeddings of graphs into a surface or as subdivisions of other graphs are both instances of topological embedding, homeomorphism of graphs is just the specialization of topological homeomorphism, the notion of a connected graph coincides with topological connectedness, and a connected graph is a tree if and only if its fundamental group is trivial.

Other simplicial complexes associated with graphs include the **Whitney complex or clique complex**, with a set per clique of the graph, and the matching complex, with a set per matching of the graph (equivalently, the clique complex of the complement of the line graph). The matching complex of a complete bipartite graph is called a chessboard complex, as it can be also described as the complex of sets of nonattacking rooks on a chessboard.



**Graph (topology)**

In topology, a branch of mathematics, a [graph is a topological space](https://en.m.wikipedia.org/wiki/Graph_(topology)) which arises from a usual graph
G
=
(
E
,
V
)
{\displaystyle G=(E,V)} by replacing vertices by points and each edge
e
=
x
y
∈
E
{\displaystyle e=xy\in E} by a copy of the unit interval
I
=
[
0
,
1
]
{\displaystyle I=[0,1]}, where
0
{\displaystyle 0} is identified with the point associated to
x
x and
1
1 with the point associated to
y
y. That is, as topological spaces, graphs are exactly the simplicial 1-complexes and also exactly the one-dimensional **CW complexes**.

A **path** is a walk with no repeated vertices

A closed path is a **cycle**.

**Cycles (ker), boundaries (img) and chains**

* Cycles (and its Kern): A cycle is a closed submanifold,

* boundary is a cycle which is also the boundary of a submanifold.

* Boundary operator on [chains](https://en.m.wikipedia.org/wiki/Chain_(algebraic_topology)) (and its image): The boundary of a chain is the linear combination of boundaries of the simplices in the chain. The boundary of a k-chain is a (k−1)-chain. Note that the boundary of a simplex is not a simplex, but a chain with coefficients 1 or −1 – thus chains are the closure of simplices under the boundary operator.

![cc](https://upload.wikimedia.org/wikipedia/commons/thumb/7/7f/Chainline.svg/320px-Chainline.svg.png)

* *The boundary of a polygonal curve is a linear combination of its nodes; in this case, some linear combination of A1 through A6. Assuming the segments all are oriented left-to-right (in increasing order from Ak to Ak+1), the boundary is A6 − A1.*

* If the 1 -chain $c=t_{1}+t_{2}+t_{3}$ is a path from point $v_{1}$ to point $v_{4},$ where $t_{1}=\left[v_{1}, v_{2}\right], t_{2}=\left[v_{2}, v_{3}\right]$ and $t_{3}=\left[v_{3}, v_{4}\right]$ are its
constituent 1 -simplices, then
$$
\begin{aligned}
\partial_{1} c &=\partial_{1}\left(t_{1}+t_{2}+t_{3}\right) \\
&=\partial_{1}\left(t_{1}\right)+\partial_{1}\left(t_{2}\right)+\partial_{1}\left(t_{3}\right) \\
&=\partial_{1}\left(\left[v_{1}, v_{2}\right]\right)+\partial_{1}\left(\left[v_{2}, v_{3}\right]\right)+\partial_{1}\left(\left[v_{3}, v_{4}\right]\right) \\
&=\left(\left[v_{2}\right]-\left[v_{1}\right]\right)+\left(\left[v_{3}\right]-\left[v_{2}\right]\right)+\left(\left[v_{4}\right]-\left[v_{3}\right]\right) \\
&=\left[v_{4}\right]-\left[v_{1}\right]
\end{aligned}
$$

![cc](https://upload.wikimedia.org/wikipedia/commons/thumb/0/02/Closed_polygonal_line.svg/320px-Closed_polygonal_line.svg.png)

* *A closed polygonal curve, assuming consistent orientation, has null boundary. (deswegen führt der boundary operator δ1 alle Werte immer in Null, zumindest bei geschlossenen objekten)*

* Example 2: The boundary of the triangle is a formal sum of its edges with signs arranged to make the traversal of the boundary counterclockwise.

   * Cycle: **A chain is called a cycle when its boundary is zero**.

   * Boundary: A chain that is the boundary of another chain is called a boundary.

   * **Boundaries are cycles**, so chains form a chain complex, whose homology groups (cycles modulo boundaries) are called simplicial homology groups.

* Example 3: A 0-cycle is a linear combination of points such that the sum of all the coefficients is 0. Thus, the 0-homology group measures the number of path connected components of the space.

* Example 4: The plane punctured at the origin has nontrivial 1-homology group **since the unit circle is a cycle, but not a boundary.**

**Chain Complex**

A chain complex
(
A
∙
,
d
∙
)
(A_{\bullet },d_{\bullet }) is a sequence of abelian groups or modules ..., A0, A1, A2, A3, A4, ... connected by homomorphisms (called boundary operators or differentials)

> **[Chain Complex](https://en.m.wikipedia.org/wiki/Chain_complex) is a sequence of homomorphism of abelian groups**

A chain complex $V$. is a sequence $\left\{V_{n}\right\}_{n \in \mathbb{Z}}$ of abelian groups or modules (for instance yector spaces) or similar equipped with linear maps $\left\{d_{n}: V_{n+1} \rightarrow V_{n}\right\}$ such that $d^{2}=0,$ i.e. the composite of two consecutive such maps is the zero morphism $d_{n} \circ d_{n+1}=0$

* https://ncatlab.org/nlab/show/chain+complex

* https://ncatlab.org/nlab/show/zero+morphism

* https://m0nads.wordpress.com/tag/persistent-homology/

![vv](https://upload.wikimedia.org/wikipedia/commons/thumb/d/d1/Simplicial_homology_-_exactness_of_boundary_maps.svg/384px-Simplicial_homology_-_exactness_of_boundary_maps.svg.png)

* The boundary of a boundary of a 2-simplex (left) and the boundary of a 1-chain (right) are taken.

* Both are 0, being sums in which both the positive and negative of a 0-simplex occur once. The boundary of a boundary is always 0.

* A nontrivial cycle is something that closes up like the boundary of a simplex, in that its boundary sums to 0, but which isn't actually the boundary of a simplex or chain.

* Because trivial 1-cycles are equivalent to 0 in H1, the 1-cycle at right-middle is homologous to its sum with the boundary of the 2-simplex at left.

![bb](https://raw.githubusercontent.com/deltorobarba/repo/master/simplicial_homology_04.png)


[Vergleich mit paper von Krishna 2017]

5 kongruent 11 und 17 etc. modulo 3, weil (3 * 1) + 2 = 5, und (3 * 3) + 2 = 11. Reste müssen identisch sein.
* **Zp (Kern)**: 5, 11, 17 etc. (alle Objekte aus einer Filtration zB in Cp, die kongruent zueinander sind, **weil deren Differenz ein ganzzahliges Vielfaches von Bp (modulo) ist)**.
* **Hp (Hom)**: 2 (= das Loch)
* **Bp (Img)**: 3 (modulo)
- drei Basiselemente: 0, 1 und 2 (?)

Before:

5 kongruent 11 modulo 3, weil 3 * 1 + 2 = 5, und 3 * 3 + 2 + 11, Reste müssen identisch sein, sowie 17 etc.

- drei Basiselemente: 0, 1 und 2
- 3 ist wie Bp, also Image (element des vektorraums, und ist linear, weil zB multiplizier mit Skala bleibt im vektorraum)
- Rest ware Hp, also 2 (das Loch)
- 5 und 11 und 17 sind Zp (Kern)
- Berate alle Objekte aus der Filtration
- Ein element aus Zp was plus ein Element aus Bp

![bb](https://raw.githubusercontent.com/deltorobarba/repo/master/simplicial_homology_01.png)

**Exact Sequences**

An [exact sequence](https://en.m.wikipedia.org/wiki/Exact_sequence) is a sequence of morphisms between objects (for example, groups, rings, modules, and, more generally, objects of an abelian category) such that the image of one morphism equals the kernel of the next.

In the context of group theory, a sequence

> $G_{0} \stackrel{f_{1}}{\longrightarrow} G_{1} \stackrel{f_{2}}{\longrightarrow} G_{2} \stackrel{f_{3}}{\longrightarrow} \cdots \stackrel{f_{n}}{\longrightarrow} G_{n}$

of groups and group homomorphisms is called exact if the image of each homomorphism is equal to the kernel of the next:

> $\operatorname{im}\left(f_{k}\right)=\operatorname{ker}\left(f_{k+1}\right)$

The sequence of groups and homomorphisms may be either finite or infinite.

> **Every exact sequence is a [chain complex](https://en.m.wikipedia.org/wiki/Chain_complex)**

**Vergleich mit exact sequence: im⁡( f k ) = ker( f k + 1 ) bzw. Zp = Bp mit Hp = 0**

![cc](https://upload.wikimedia.org/wikipedia/commons/thumb/b/bc/Illustration_of_an_Exact_Sequence_of_Groups.svg/640px-Illustration_of_an_Exact_Sequence_of_Groups.svg.png)

![ccc](https://upload.wikimedia.org/wikipedia/commons/thumb/4/4c/KerIm_2015Joz_L2.png/640px-KerIm_2015Joz_L2.png)

**Betti Numbers (Betti Sequence)**

* https://de.wikipedia.org/wiki/Topologische_Invariante

* The formal definition of homology uses the language of group theory. (The equivalence class of loops surrounding a hole have a group structure.) Persistent homology examines these homological features from a multiscale perspective.

* Persistent homology is a powerful tool to compute, study and encode efficiently multiscale topological features of nested families of simplicial complexes and topological spaces.

* It does not only provide efficient algorithms to compute the Betti numbers of each complex in the considered families, as required for homology inference in the previous section, but also encodes the evolution of the homology groups of the nested complexes across the scales.
Im Bereich der algebraischen Topologie sind die Homologien beziehungsweise die **Homologiegruppen Invarianten eines topologischen Raums**.

* **Simplicial homology groups and Betti numbers are topological invariants**: if K,K′ are two simplicial complexes whose geometric realizations are homotopy equivalent, then their homology groups are isomorphic and their Betti numbers are the same.

* Persistent Homology, a recent breakthrough idea, extends Homology theory to work across a range of parameterized Simplicial Complexes, like the one arising from a point cloud, instead of just a single, isolated complex.
It looks for topological invariants across various scales of a topological manifold.

* „But there is a problem. Betti numbers are computationally demanding to calculate, “quickly overwhelming even the most powerful classical computers, even for not-so-large data sets,”

* https://www.technologyreview.com/s/610138/a-small-scale-demonstration-shows-how-quantum-computing-could-revolutionize-data-analysis/

* In algebraic topology, the Betti numbers are used to distinguish topological spaces based on the connectivity of n-dimensional simplicial complexes. For the most reasonable finite-dimensional spaces (such as compact manifolds, finite simplicial complexes or CW complexes), the sequence of Betti numbers is 0 from some point onward (Betti numbers vanish above the dimension of a space), and they are all finite.

* **The nth Betti number represents the rank of the nth homology group, denoted Hn, which tells us the maximum amount of cuts that must be made before separating a surface into two pieces or 0-cycles, 1-cycles, etc.[1] These numbers are used today in fields such as simplicial homology, computer science, digital images, etc**.

* https://en.wikipedia.org/wiki/Betti_number

* https://en.wikipedia.org/wiki/Homology_(mathematics)

* Betti numbers is a compact method to present this Homology Groups by investigating the properties of the topological spaces.

* It distinguishes topological spaces according to the connectivity of n-dimensional simplicial complexes. The nth Betti number represents the rank of the nth homology group, denoted as Hn.

* Informally, the nth Betti number refers to the number of n-dimensional holes on a topological surface.

* Figure 9 shows that the first three Betti numbers have the following definitions for 0-dimensional, 1-dimensional, and 2- dimensional simplicial complexes: β0 is the number of connected components β1 is the number of holes(one-dimensional) and β2 is the number of two-dimensional "voids".

![geometry](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_437.png)

Die Bettizahlen geben an, wie viele k-dimensionale nicht zusammenhängende Flächen der entsprechende topologische Raum hat. Die ersten drei Bettizahlen besagen anschaulich also:

* b0 ist die Anzahl der Wegzusammenhangskomponenten* (connected components)

* b1 ist die Anzahl der „zweidimensionalen Löcher“.

* b2 ist die Anzahl der dreidimensionalen Hohlräume.

Der unten abgebildete Torus (gemeint ist Oberfläche) besteht aus einer Zusammenhangskomponente, hat zwei „zweidimensionale Löcher“, zum einen das in der Mitte, zum andern das im Inneren des Torus, und hat einen dreidimensionalen Hohlraum. Die Bettizahlen des Torus sind daher 1, 2, 1, die weiteren Bettizahlen sind 0.

Ist der zu betrachtende topologische Raum jedoch keine orientierbare kompakte Mannigfaltigkeit, so versagt diese Anschauung allerdings schon.


**Betti number & Rank of a group**

* In algebraic topology, the [Betti numbers](https://en.m.wikipedia.org/wiki/Betti_number) are used to distinguish topological spaces based on the connectivity of n-dimensional simplicial complexes. For the most reasonable finite-dimensional spaces (such as compact manifolds, finite simplicial complexes or CW complexes), the sequence of Betti numbers is 0 from some point onward (Betti numbers vanish above the dimension of a space), and they are all finite.

* The nth Betti number represents the [rank (of a group)](https://en.m.wikipedia.org/wiki/Rank_of_a_group) of the nth [homology group](https://en.m.wikipedia.org/wiki/Homology_(mathematics)), denoted Hn, **which tells us the maximum number of cuts that can be made before separating a surface into two pieces or 0-cycles, 1-cycles**, etc.

* The first few Betti numbers have the following definitions for 0-dimensional, 1-dimensional, and 2-dimensional simplicial complexes:

  * b0 is the number of (connected) components;
  * b1 is the number of one-dimensional or "circular" holes;
  * b2 is the number of two-dimensional "voids" or "cavities".

* for example, a torus has one connected surface component so b0 = 1, two "circular" holes (one equatorial and one meridional) so b1 = 2, and a single cavity enclosed within the surface so b2 = 1.

![gg](https://upload.wikimedia.org/wikipedia/commons/5/54/Torus_cycles.png)




**Homology Group & Betti Number**

Let $\sigma=\left(v_{0}, \ldots, v_{k}\right)$ be an oriented $k$ -simplex, viewed as a basis element of $C_{k}$. The boundary operator

$
\partial_{k}: C_{k} \rightarrow C_{k-1}
$

is the homomorphism defined by:

$
\partial_{k}(\sigma)=\sum_{i=0}^{k}(-1)^{i}\left(v_{0}, \ldots, \widehat{v_{i}}, \ldots, v_{k}\right)
$

where the oriented simplex

$
\left(v_{0}, \ldots, \widehat{v_{i}}, \ldots, v_{k}\right)
$

is the $I^{\text {th }}$ face of $\sigma,$ obtained by deleting its $i^{\text {th }}$ vertex.
In $C_{k},$ elements of the subgroup

$
Z_{k}:=\operatorname{ker} \partial_{k}
$

are referred to as cycles, and the subgroup

$
B_{k}:=\operatorname{im} \partial_{k+1}
$

is said to consist of boundaries.

The $k^{\text {th }}$ homology group $H_{k}$ of $S$ is defined to be the [quotient abelian group](https://en.m.wikipedia.org/wiki/Quotient_group)

$
H_{k}(S)=Z_{k} / B_{k}
$

* It follows that the **homology group $H_{k}(S)$ is nonzero exactly when there are $k-$
cycles on $S$ which are not boundaries**. In a sense, this means that there are $k$ -
dimensional holes in the complex.

* For example, consider the complex $S$ obtained by gluing two triangles (with no interior) along one edge, shown in the image. The edges of each triangle can be oriented so as to form a cycle. These two cycles are by construction not boundaries (since every 2 -chain is zero).

* One can compute that the homology group $\mathrm{H}_{1}(\mathrm{S})$ is isomorphic to $\mathrm{Z}^{2}$, with a basis given by the two cycles mentioned. This makes precise the informal idea that $S$ has two "1-
dimensional holes".

* Holes can be of different dimensions. The rank of the $k$ th homology group, the
number

$
\beta_{k}=\operatorname{rank}\left(H_{k}(S)\right)
$

**is called the $k$ th Betti number of $S$. It gives a measure of the number of $k$ - dimensional holes in $S$.**

**A homology class is thus represented by a cycle which is not the boundary of any submanifold: the cycle represents a hole, namely a hypothetical manifold whose boundary would be that cycle, but which is "not there".**

* An homology class Hn (which represents a hole) is an [equivalence class](https://en.m.wikipedia.org/wiki/Equivalence_class) of
    * [cycles](https://en.m.wikipedia.org/wiki/Simplicial_homology#Boundaries_and_cycles) Ker(δ) modulo boundaries Im(δ) bzw.
    * h(e) $\equiv$ t(e) mod (im d)

* Ker(δ) kann beschrieben werden: h(e) $\equiv$ t(e) bzw. e $\mapsto$ h(e) - t(e) or: ∂n−1 ◦ ∂n = 0

* This is the linear combination of two vertices, and hence an element of e zero

* Get the image of d bzw. Im(δ) when you look at the subspace generated by e $\mapsto$ h(e) - t(e)

*  Im(δ): boundaries?


https://en.m.wikipedia.org/wiki/Simplicial_homology#Boundaries_and_cycles

https://ncatlab.org/nlab/show/boundary+of+a+simplex

https://ncatlab.org/nlab/show/chain+map

https://en.m.wikipedia.org/wiki/Spectral_sequence

**Fundamental group**

In the mathematical field of algebraic topology, the [fundamental group](https://en.m.wikipedia.org/wiki/Fundamental_group) of a topological space is the group of the equivalence classes under homotopy of the loops contained in the space. It records information about the basic shape, or holes, of the topological space.

Siehe auch: [Graph (topology)](https://en.m.wikipedia.org/wiki/Graph_(topology))

If $X$ is a graph and ${\displaystyle T\subseteq X}$ a maximal tree, then the fundamental group $\pi _{1}(X)$ equals the free group generated by elements ${\displaystyle (f_{\alpha })_{\alpha \in A}}$, where the ${\displaystyle \{f_{\alpha }\}}$ correspond bijectively to the edges of ${\displaystyle X\setminus T}$; in fact, $X$ is homotopy equivalent to a wedge sum of circles.

*Topological Data Analysis*

https://www.spektrum.de/news/topologische-datenanalyse-big-data-fuer-quantencomputer/2128650

* Der Physiker hat auch eine Vermutung, weshalb Quantencomputer diese Art von Aufgaben besser meistern. Es könnte eine bisher unerwartete Verbindung zwischen der Quantenmechanik und der TDA geben: die Supersymmetrie.

* Damit der Quantenalgorithmus exponentiell schneller läuft als klassische Verfahren (was der üblichen Messlatte für einen Quantenvorteil entspricht), **muss die Anzahl hochdimensionaler Löcher in den Datensätzen unvorstellbar groß sein – in der Größenordnung von Billionen**.

* »Solche Bedingungen sind in der realen Welt nur schwer zu finden«, sagt Cade. **Es sei unklar, ob derartige Daten überhaupt existieren**, so Ryan Babbush, einer der Hauptautoren der Google-Studie.

* Ewin Tang, die jetzt an der University of Washington promoviert, hält die TDA nicht für jene praktische Quantenanwendung, nach der die Informatiker suchen. Sie geht davon aus, dass **Quantencomputer am ehesten nützlich sein werden, um etwas über Quantensysteme zu lernen – und nicht, um klassische Daten zu analysieren**.

* Ein neuer kreativer Ansatz könnte jederzeit das schaffen, was Tang und ihren Kollegen bisher nicht gelang: ein effizientes TDA-Verfahren zu entwerfen, das auf gewöhnlichen Rechnern läuft. »Ich würde weder mein Haus noch mein Auto oder meine Katze darauf verwetten, dass das nicht passieren wird«, sagt Dunjko.

https://www.nature.com/articles/ncomms10138

**Persistent Barcode, Diagram & Landscape**

* What is the ideal size of d? - Consider all distances d between d1 min (to connect two balls) and d2 max size (all are connected). Each hole appears at a particular value of d and disappears at anotjer value of d. We can represent the persistence of this hole as a pair (d1, d2).
* Out of this distance we get a bar. Several holes result in a barcode. Short bars represent noise. Long bars are features.
* Persistent barcodes are stable with respect to pertubations if data (Edelsbrunner 2007).
* Barcode is computable via linear algebra. Runtime is O (n3), it‘s cubic, where n is the number of simplices (Carlsson 2005).
* A barcode is a visualization of an algebraic structure.

![vvv](https://raw.githubusercontent.com/deltorobarba/repo/master/homology_06.jpg)

> **Homology of a simplicial complex is computable via linear algebra**


![vvv](https://raw.githubusercontent.com/deltorobarba/repo/master/homology_07.jpg)

**Topological Data Analysis + Machine Learning**

![xxx](https://raw.githubusercontent.com/deltorobarba/repo/master/tda_01.jpg)

**Hopf Fibration**

Map of 4D sphere into 3D sphere

https://de.m.wikipedia.org/wiki/Hopf-Faserung

https://en.m.wikipedia.org/wiki/Principal_bundle

https://en.m.wikipedia.org/wiki/Group_action#Orbit-stabilizer_theorem

https://de.m.wikipedia.org/wiki/Villarceau-Kreise

https://youtu.be/-lZTnEgbVvI

https://youtu.be/nsHcKO7HvFY

###### *Group Theory*

Galois Group: https://youtu.be/Ct2fyigNgPY

https://en.m.wikipedia.org/wiki/Group_theory

Representation theory & Group representation

[Representation theory: The ‘Useless’ Perspective That Transformed Mathematics](https://www.quantamagazine.org/the-useless-perspective-that-transformed-mathematics-20200609/)


https://en.m.wikipedia.org/wiki/Representation_theory

https://en.m.wikipedia.org/wiki/Group_representation

**Kongruenzklassen, Äquivalenzklassen & Restklassen (Modulorechnung)**

* **Kongruenz**: [Kongruenz (Zahlentheorie)](https://de.m.wikipedia.org/wiki/Kongruenz_(Zahlentheorie))) Die Kongruenz zwischen zwei ganzen Zahlen ist in Bezug auf einen Teiler definiert. **Der Teiler heißt in diesem Zusammenhang Modul.**
  * Gegeben sei ein Modul $m \in \mathbb{N}$. Zwei ganze Zahlen $a$ und $b$ heißen kongruent modulo $m$, **wenn die Division von $a$ und $b$ durch $m$ den gleichen Rest $r$ lässt**.
  * Kongruenz ist eine Art Erweiterung der Modulorechnung: **Modulorechnung**: 17 mod 3 = 2.
  * **Kongruenz: 11 $\equiv$ 17 mod 3 (weil bei beiden der Rest 2 ist - beide sind in der der gleichen Restklasse)**
  * Man kann alternativ zur [**Restklassenermittlung**](https://de.m.wikipedia.org/wiki/Restklasse) auch sagen, dass zwei Zahlen kongruent sind (modulo der natürlichen Zahl n), wenn ihre Differenz durch n teilbar ist. Hier: 17 - 11 = 6, ist teilbar durch 3.

* [**Äquivalenz und Äquivalenzklassen** (Congruence Classes)](https://de.m.wikipedia.org/wiki/Äquivalenzrelation). Äquivalenz: Objekte, die sich in einem bestimmten Zusammenhang gleichen, als gleichwertig bzw. äquivalent angesehen. The result of the modulo operation is an equivalence class (Äquivalenzklassen). Any member of the class may be chosen as representative.

  * Von besonderem Interesse sind jedoch solche Äquivalenzrelationen $\equiv$ , deren Quotientenabbildung $\mathrm{q}_{\mathrm{z}}: A \rightarrow A / \equiv, a \mapsto[a]_{\equiv}$ **mit der Struktur auf $A$ verträglich bzw. ein Homomorphismus ist**, weil dann die von $\mathrm{q}_{=}$ erzeugte Struktur auf der [Quotientenmenge](https://de.wikipedia.org/wiki/Äquivalenzrelation#Quotientenmenge_und_Partition) $A / \equiv$ von der gleichen Art ist wie die von $A$. **Eine solche Äquivalenzrelation $\equiv$ nennt man eine Kongruenzrelation auf der strukturierten Menge $A$.** <font color="blue">Die Quotientengruppe G/N ist homomorph zur Gruppe G.</font>

* **[Restklassen](https://de.wikipedia.org/wiki/Restklasse)** sind die Äquivalenzklassen in der Kongruenzrelation. Eine Zahl a modulo einer Zahl m die Menge aller Zahlen, die bei Division durch m denselben Rest lassen wie a. See also [Modulo Operation](https://en.m.wikipedia.org/wiki/Modulo_operation) (Restklassenrechnung).

* In der Gruppentheorie werden Äquivalenzklassen als [Nebenklassen (Cosets)](https://de.m.wikipedia.org/wiki/Gruppentheorie#Nebenklassen) bezeichnet.

**Subgroups, Normal Subgroups & Cosets**

* **Subgroups (Unterguppen):**

  * Gegeben sei $\mathbb{Z}$ die Gruppe der ganzen Zahlen mit der Addition als Gruppenoperation. Man kann fur diese Gruppe verschiedene (unendlich viele) **[Untergruppen](https://de.m.wikipedia.org/wiki/Untergruppe) (Subgroups)** bilden: 2$\mathbb{Z}$, 3$\mathbb{Z}$, 4$\mathbb{Z}$, 5$\mathbb{Z}$, 6$\mathbb{Z}$, 7$\mathbb{Z}$.

  * Beispiel: Die ganzen Zahlen $\mathbb {Z} $ sind bezüglich der Addition eine Untergruppe der rationalen Zahlen $\mathbb {Q} $.

  * Two standard subgroups / every group has at least 2 subgroups: the identity element {e} and the entire group G. These are technically **normal subgroups (Normalteiler)**.

  * If a group has no other normal subgroups then these two, than it's called a **simple group**. A simple group does not have any factor (quotient) groups, but they are the building blocks of other groups.

* **Normal Subgroups (Normalteiler)**:

  * <font color="blue">**Die Äquivalenzklasse mit dem Rest Null ist der Normalteiler**. Normalteiler sind spezielle Untergruppen, und ihre Bedeutung liegt vor allem darin, **dass sie genau die Kerne von Gruppenhomomorphismen sind** (=Lösungsmenge ist Null, also Null als Rest).</font> Normal Subgroups heissen auch "invariant or self-conjugate subgroups".

  * Normal subgroups determine what kinds of homomorphisms are possible from a group $G$ to other groups $f : G -> H$.

  * Trivial examples: Standard subgroups identity element {e} and the entire group G.

  * Die Gruppe $\mathbb{Z}$ ist **abelsch** (kommutativ) und somit ist jede Untergruppe auch ein **[Normalteiler](https://de.wikipedia.org/wiki/Normalteiler) bzw. [Normal Subgroup](https://en.wikipedia.org/wiki/Normal_subgroup)**.

* **Nebenklassen (Cosets)**:

  * Nebenklassen werden benutzt, um den [Satz von Lagrange](https://de.m.wikipedia.org/wiki/Satz_von_Lagrange) zu beweisen, um die Begriffe [Normal Subgroup (Normalteiler)](https://de.m.wikipedia.org/wiki/Normalteiler) und [Quotient Group (Faktorgruppe)](https://de.m.wikipedia.org/wiki/Faktorgruppe) zu erklären und um Gruppenoperationen zu studieren.

  * In contrast to Subgroups, the Coset (Nebenklasse) are not closed under addition, have no inverse and don't contain the identity element

  * Cosets are there to define how many (finite) possible subgroups exist in a group. There are **two types of cosets**: left cosets and right cosets.

  * A subgroup $H$ of a group $G$ may be used to decompose the underlying set of $G$ into disjoint, equal-size subsets called [cosets](https://en.m.wikipedia.org/wiki/Coset)

* <font color="blue">**Example: Use Normal Subgroup 5$\mathbb{Z}$ von $\mathbb{Z}$ (integers mod 5) to divide a Group into Cosets. We get 5 sets of remainders (the congruence classes) which are the [Quotient Group (Faktorgruppe)](https://en.m.wikipedia.org/wiki/Quotient_group), a group with 5 elements: $\mathbb{Z}$ mod 5 = {$\overline{0}$, $\overline{1}$, $\overline{2}$, $\overline{3}$, $\overline{4}$}.**</font>

    * 5$\mathbb{Z}$ + Rest 0 = $\overline{0} : \{\ldots-10,-5,0,5,10 \ldots\}$ **(Normal) Subgroup**, which is technically also a Coset 0+ 5 $\mathbb{Z}$

    * 5 $\mathbb{Z}$ + Rest 1 = $\overline{1} : \{\ldots-9,-4,1,6,11 \ldots\}$ **Coset 1+ 5 $\mathbb{Z}$**

    * 5 $\mathbb{Z}$ + Rest 2 = $\overline{2} : \{\ldots-8,-3,2,7,12 \ldots\}$ **Coset 2+ 5 $\mathbb{Z}$**

    * 5 $\mathbb{Z}$ + Rest 3 = $\overline{3} : \{\ldots-7,-2,3,8,13 \ldots\}$ **Coset 3+ 5 $\mathbb{Z}$**

    * 5 $\mathbb{Z}$ + Rest 4 = $\overline{4} : \{\ldots-6,-1,4,9,14 \ldots\}$ **Coset 4+ 5 $\mathbb{Z}$**

* **Quotient Group (Faktorgruppe)**:

  * Die Faktorgruppe oder Quotientengruppe wird mit $G / N$ bezeichnet und ist die Menge der Nebenklassen (Cosets). Aus einer Gruppe $G$ und jedem ihrer Normalteiler $N$ lässt sich eine Faktorgruppe $G/N$ bilden.

    * Die Quotientengruppe unterteilt eine Menge in Äquivalenzklassen bzw. eine Gruppe in Restklassen / Nebenklassen. Diese Menge der Restklassen (Äquivalenzklassen) heisst **[Quotientenmenge](https://de.wikipedia.org/wiki/Äquivalenzrelation#Quotientenmenge_und_Partition) bzw. Faktormenge**.

  * Diese [Quotientengruppen](https://de.wikipedia.org/wiki/Faktorgruppe) sind homomorphe Bilder von G, **und jedes homomorphe Bild von G ist zu einer solchen Quotientengruppe G/N isomorph**. See fundamental theorem on homomorphisms of groups as "**Every homomorphic image of a group is isomorphic to a quotient group**". [Source](https://en.m.wikipedia.org/wiki/Fundamental_theorem_on_homomorphisms)

  * For finite groups you can find a chain of normal subgroups called a "composition series" which acts as a kind of 'prime factorization' of the group (1 ◃ N1, ◃ N2 ◃ ... ◃ Nr ◃ G ). Normal subgroups can also be used to study fields (Körper), i.e. in Galois theory (Field extension K / F)


**Centralizer and Normalizer**

https://en.m.wikipedia.org/wiki/Centralizer_and_normalizer

In group theory, a branch of mathematics, the normalizer of a subset of a group is a certain associated subgroup.

Let's suppose we have a group G and a subset S of G. The normalizer of S in G is the set of all elements in G that commute with S in a certain sense. More specifically, the normalizer is the largest subgroup of G in which S is a subset that commutes with each element of the subgroup under the group operation of conjugation.

Formally, the normalizer of S in G, denoted by N_G(S), is defined as:

$N_G(S)$ = $g$ in $G$ : $gSg^{(-1)} = S$

Here, $gSg^{(-1)})$ = S means that for every element s in S, there exists some s' in S such that gsg^(-1) = s'. In words, the normalizer of S in G consists of all elements g of G such that when any element of S is conjugated by g (i.e., you multiply it by g on the left and by the inverse of g on the right), the result is still an element of S.

In simple terms, the normalizer of a subset S of a group G is the largest subgroup of G in which S "behaves nicely", in the sense that the action of conjugation by elements of the subgroup leaves S unchanged. The concept of the normalizer is particularly important in the study of the structure and symmetries of groups.


**Normalizer: conjugating with elements of the group results still in same group**

Sure, let's consider a simple example with a finite group. Let's take the symmetric group S_3 (the group of all permutations of three elements), which has six elements: {e, (12), (13), (23), (123), (132)} where e is the identity permutation, (12) swaps elements 1 and 2, etc.

Now let's consider the subset S = {(12), e}.

We want to find the normalizer of this set. **By the definition of the normalizer, it consists of all elements g in S_3 such that when any element of S is conjugated by g, the result is still an element of S**. So we need to find all elements g in S_3 such that g(12)g^(-1) is in S and g(e)g^(-1) is in S.

Well, conjugating the identity e by any element gives e back, so that part is trivial.

So let's look at the conjugation of (12) by each element of S_3:

- Conjugating by e gives (12), which is in S.
- Conjugating by (12) gives (12), which is in S.
- Conjugating by (13) gives (23), which is not in S.
- Conjugating by (23) gives (13), which is not in S.
- Conjugating by (123) gives (13), which is not in S.
- Conjugating by (132) gives (23), which is not in S.

So, the normalizer of S in S_3 is {e, (12)}. This is the largest subgroup of S_3 in which S is a subset that commutes with each element of the subgroup under the operation of conjugation.

*Generators of a group*

![geometry](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_0942.jpg)

> **Each group SU(3) x SU(2) x U(1) leads to a symmetry resulting in a conservation law**

**Group Operations: Zusammenfassung algebraischer Strukturen mit Operationen**

$+$ $\quad$ $-$ $\quad$ is a Group

$+$ $\quad$ $-$ $\quad$ $\cdot$ $\quad$ is a Ring

$+$ $\quad$ $-$ $\quad$ $\cdot$ $\quad$ $÷$ $\quad$ is a Field


A group is a set $G$ with an operation $*$ such that

1. **Closure** [Abgeschlossenheit](https://de.m.wikipedia.org/wiki/Abgeschlossenheit_(algebraische_Struktur)): If $x$ and $y$ are in $G$ then $x * y$ is in $\mathrm{G}$
  * until here it is a [Magma (Groupoid)](https://de.m.wikipedia.org/wiki/Magma_(Mathematik))
  * Beispiele (Magmen, die keine Halbgruppen sind):
    * $(\mathbb{Z},-):$ die ganzen Zahlen mit der Subtraktion
    * (R $\backslash\{0\}, /)$ : die reellen Zahlen ungleich 0 mit der Division
    * Die natürlichen Zahlen mit der Exponentiation, also mit der Verknüpfung $a * b=a^{b}$
    * Die reellen Zahlen mit der Bildung des arithmetischen Mittels als Verknüpfung

2. **Associativity** [Assoziativität](https://de.m.wikipedia.org/wiki/Assoziativgesetz): For all $x, y, z$ in $G$ $\text { we have }(x * y) * z=x *(y * z)$
  * until here it is a [Halbgruppe](https://de.m.wikipedia.org/wiki/Halbgruppe)
  * Beispiel: Die Menge $\mathbb  N$ $_0$ = {0, 1, 2 ..} der natürlichen Zahlen bildet mit der gewöhnlichen Addition eine kommutative und kürzbare Halbgruppe ($\mathbb  N$ $_0$,+), die keine Gruppe ist. Da hier die negativen Zahlen fehlen, also die „Hälfte“ der abelschen Gruppe ($\mathbb Z,+$) der ganzen Zahlen, lag der Name Halbgruppe für diese mathematische Struktur nahe.

3. **Identity Element** [Neutrales Element](https://en.m.wikipedia.org/wiki/Identity_element): There is an element $e$ in $G$ such that $e * x=x * e=x$ for all $x$ in $G$
  * until here it is a [Monoid](https://de.m.wikipedia.org/wiki/Monoid)
  * Beispiel: die natürlichen Zahlen mit der Addition und der Zahl 0 als neutralem Element.

4. **Inverse Elements** [Inverse Elemente](https://de.m.wikipedia.org/wiki/Inverses_Element): For each element $x$ in $G,$ there is an element $x^{-1}$ such that $x * x^{-1}=x^{-1} * x=e$
  * until here it is a [Group](https://de.m.wikipedia.org/wiki/Gruppe_(Mathematik))
  * Beispiel: die Menge der ganzen Zahlen zusammen mit der Addition
  * Ringe, Körper (Field), Moduln und Vektorräume sind Gruppen mit zusätzlichen Strukturen und Eigenschaften

5. **Kommutative** [Kommutativität](https://de.m.wikipedia.org/wiki/Kommutativgesetz): Für alle $a, b \in G$ gilt: $a * b=b * a$
  * until here it is an [Abelian Group](https://de.m.wikipedia.org/wiki/Abelsche_Gruppe)
  * Beispiel:
    * $(\mathbb {Z} ,+)$ ist die wichtigste abelsche Gruppe. Dabei ist Z die Menge der ganzen Zahlen und + die gewöhnliche Addition.
    * $(\mathbb {Q}^{\cdot} , \cdot)$ ist eine abelsche Gruppe. Dabei ist $\mathbb {Q}^{\cdot}$ die Menge der rationalen Zahlen ohne die 0 und ⋅ ist die gewöhnliche Multiplikation. Die Null muss hierbei ausgeschlossen werden, da sie kein inverses Element besitzt: „1/0“ ist nicht definiert
    * Die Menge der Verschiebungen in der euklidischen Ebene bilden eine abelsche Gruppe. Die Verknüpfung ist die Hintereinanderausführung der Verschiebungen.
    * Die Menge der Drehungen in einer Ebene um einen Punkt bilden eine abelsche Gruppe. Die Verknüpfung ist die Hintereinanderausführung der Drehungen.
    * Die Menge der Drehstreckungen in einer Ebene bilden eine abelsche Gruppe.
    * Die Menge der endlichen Dezimalzahlen sind bezüglich der Multiplikation keine abelsche Gruppe. Zum Beispiel hat die Zahl 3 kein Inverses bezüglich der Multiplikation. $\displaystyle {\frac {1}{3}}$ lässt sich nicht als endlicher Dezimalbruch schreiben. Bezüglich der normalen Addition bilden die endlichen Dezimalbrüche eine abelsche Gruppe.
    * Die Menge der Verschiebungen in der euklidischen Ebene bilden eine abelsche Gruppe. Die Verknüpfung ist die Hintereinanderausführung der Verschiebungen.
  * [Free Abelian Group](https://de.m.wikipedia.org/wiki/Freie_abelsche_Gruppe): eine abelsche Gruppe, die als $\mathbb {Z}$-Modul eine Basis hat. Ist G eine freie abelsche Gruppe ist, so wird man eine Basis wählen und alle Elemente als Linearkombinationen von Elementen dieser Basis ausdrücken. Allerdings sollte man betonen, dass es meist keine ausgezeichnete Basis geben wird.

6. **Distributive law** [Distributivgesetz](https://de.m.wikipedia.org/wiki/Distributivgesetz): a(b + c) = ab + ac.
  * Gilt zusätzlich Distributivgesetz gilt für Ring (da man hier zwei Operationen benötigt). Gruppen haben immer nur eine Operation.


**Applications of Group Theory**

* Solutions to polynomial equations, like find 2 roots of a quadratic equation. there is also q cubic formula and a quartic for degree 4 polynomial. Find formula to solve degree 5 polynomial - group theory showed it doesnt exist. Has to do with permutation group S5.

* Connection to Physics - Noether's theorem: Conservation law - symmetry. Momentum - translation in space. Energy - translation in time.

> [Researchers Use Group Theory to Speed Up Algorithms — Introduction to Groups](https://www.youtube.com/watch?v=KufsL2VgELo&list=WL&index=3&t=1579s)

![ggg](https://raw.githubusercontent.com/deltorobarba/repo/master/groups.png)

**Trivial Group (Zero Group)**

Die [triviale Gruppe](https://de.m.wikipedia.org/wiki/Triviale_Gruppe) ist in der Gruppentheorie eine Gruppe, deren Trägermenge genau ein Element enthält. Die triviale Gruppe ist bis auf Isomorphie eindeutig bestimmt. **Jede Gruppe enthält die triviale Gruppe als Untergruppe**.

Die triviale Gruppe $(\{e\}, *)$ ist eine Gruppe, die aus der einelementigen Menge $\{e\}$ besteht und versehen ist mit der einzig möglichen Gruppenoperation

$
e * e=e
$

Das Element $e$ ist damit das **neutrale Element** der Gruppe.

Alle trivialen Gruppen sind zueinander isomorph. Beispiele für triviale Gruppen sind:

* die zyklische Gruppe $C_{1}$ vom Grad 1

* die alternierende Gruppe $A_{2}$ vom Grad 2

* die symmetrische Gruppe $S_{1}$ einer einelementigen Menge

*Eigenschaften trivialer Gruppen*:

* Da die Gruppenoperation $\ast$ kommutativ ist, ist die triviale Gruppe eine abelsche Gruppe.

* Die einzige Untergruppe der trivialen Gruppe ist die triviale Gruppe selbst.

* Die triviale Gruppe wird von der leeren Menge erzeugt:
$\{e\}=\langle \emptyset \rangle$ . Hierbei ergibt das leere Produkt nach üblicher Konvention das neutrale Element.

* Jede Gruppe enthält die triviale Gruppe und sich selbst als (triviale) Normalteiler. **Die triviale Gruppe wird daher meistens nicht als einfache Gruppe angesehen**, die aus genau 2 Normalteilern besteht).

* In der Kategorie der Gruppen Grp fungiert die triviale Gruppe als Nullobjekt.

**Simple Group**

* [Simple Groups](https://en.wikipedia.org/wiki/Simple_group) bzw. [einfache Gruppen](https://de.wikipedia.org/wiki/Einfache_Gruppe_(Mathematik)) **are the fundamental building blocks of finite groups** (just like Prime numbers are fundamental building blocks in number theory)

* Just as you can factor integers into prime numbers, you can break apart some groups into a direct product of simpler groups.

*  **a simple group is a <u>nontrivial</u> group whose only normal subgroups are the trivial group and the group itself.**

* **Jede Gruppe hat sich selbst und die nur das neutrale Element enthaltende Menge als Normalteiler.**

Damit stellt sich die Frage, welche Gruppen keine weitere Normalteiler besitzen. Bei diesen handelt es sich per definitionem gerade um die einfachen Gruppen.

  * Eine Gruppe $G$ heisst einfach, falls sie als Normalteiler nur $G$ und $\{e\}$ mit dem neutralen Element $e$ hat.
  * Außerdem wird zusătzlich $G \neq\{e\}$ gefordert, wonach man knapper sagen kann:
  * **Eine Gruppe heißt einfach, wenn sie genau zwei Normalteiler besitzt.**

* A group that is not simple can be broken into two smaller groups, namely a nontrivial [normal subgroup](https://en.wikipedia.org/wiki/Normal_subgroup) and the corresponding quotient group. This process can be repeated, and for finite groups one eventually arrives at uniquely determined simple groups, by the [Jordan–Hölder theorem (Composition series)](https://en.wikipedia.org/wiki/Composition_series).

* The complete classification of finite simple groups, completed in 2004, is a major milestone in the history of mathematics.

**Seit 1982 sind die endlichen einfachen Gruppen vollständig klassifiziert, die Liste besteht aus**

* den zyklischen Gruppen von Primzahlordnung,

* den alternierenden Gruppen $A_{n}$ mit $n\geq 5$,

* den Gruppen vom Lie-Typ (16 jeweils unendliche Serien)

* 26 sporadischen Gruppen (Es handelt sich um die endlichen einfachen Gruppen, die sich nicht in eine der (18) systematischen Familien mit unendlich vielen Mitgliedern (von endlichen einfachen Gruppen) einordnen lassen.)

**Finite Groups**

* Eine Gruppe ($G$,*) heißt [endliche Gruppe](https://de.m.wikipedia.org/wiki/Endliche_Gruppe), wenn $G$ eine endliche Menge ist, also eine endliche Anzahl von Elementen hat.

* Die Annahme der Endlichkeit ermöglicht ein vereinfachtes Axiomensystem

Ein Paar $(G, *)$ mit einer endlichen Menge $G$ und einer inneren zweistelligen Verknüpfung $*: G \times G \rightarrow G$ heißt Gruppe, wenn folgende Axiome erfüllt sind:

* Assoziativität: Für alle Gruppenelemente $a, b, c$ gilt $(a * b) * c=a *(b * c)$,

* [Kürzungsregel](https://de.m.wikipedia.org/wiki/Kürzbarkeit): Aus $a * x=a * x^{\prime}$ oder $x * a=x^{\prime} * a$ folgt $x=x^{\prime}$

Aus der Kürzungsregel folgt, dass die Links- und Rechtsmultiplikationen $x \mapsto a * x$ und $x \mapsto x * a$
injektiv sind, woraus wegen der Endlichkeit auch die Surjektivität folgt. Daher gibt es ein $x$ mit
$a * x=a,$ was zur Existenz des neutralen Elementes $e$ führt, und dann ein $x$ mit $a * x=e$, was
die Existenz der inversen Elemente zeigt.

* The [List of small groups](https://en.wikipedia.org/wiki/List_of_small_groups) contains finite groups of small [order](https://en.wikipedia.org/wiki/Order_(group_theory)) [up to](https://en.wikipedia.org/wiki/Up_to) [group isomorphism](https://en.wikipedia.org/wiki/Group_isomorphism).

* Die folgende Liste enthält eine Auswahl [endlicher Gruppen kleiner Ordnung](https://de.m.wikipedia.org/wiki/Liste_kleiner_Gruppen).

  * Diese Liste kann benutzt werden, um herauszufinden, zu welchen bekannten endlichen Gruppen eine Gruppe G isomorph ist.

  * Als erstes bestimmt man die Ordnung von G und vergleicht sie mit den unten aufgelisteten Gruppen gleicher Ordnung.

  * Ist bekannt, ob G abelsch (kommutativ) ist, so kann man einige Gruppen ausschließen. Anschließend vergleicht man die Ordnung einzelner Elemente von G mit den Elementen der aufgelisteten Gruppen, wodurch man G bis auf Isomorphie eindeutig bestimmen kann.

In der nachfolgenden Liste werden folgende Bezeichnungen verwendet:

- $\mathbb{Z}_{n}$ ist die zyklische Gruppe der Ordnung $n$ (die auch als $C_{n}$ oder $\mathbb{Z} / n \mathbb{Z}$ geschrieben wird).

- $D_{n}$ ist die Diedergruppe der Ordnung $2 n$.

- $S_{n}$ ist die symmetrische Gruppe vom Grad $n$, mit $n !$ Permutationen von $n$ Elementen.

- $A_{n}$ ist die alternierende Gruppe vom Grad $n$, mit $n ! / 2$ Permutationen von $n$ Elementen für $n \geq 2$.

- Dic $_{n}$ ist die dizyklische Gruppe der Ordnung $4 n$.

- $V_{4}$ ist die [Klein'sche Vierergruppe](https://de.m.wikipedia.org/wiki/Kleinsche_Vierergruppe) der Ordnung $4 .$

- $Q_{4 n}$ ist die Quaternionengruppe der Ordnung $4 n$ fur $n \geq 2$.

[Liste aller Gruppen bis Ordnung 20](https://de.m.wikipedia.org/wiki/Liste_kleiner_Gruppen#Liste_aller_Gruppen_bis_Ordnung_20)

https://www.spektrum.de/kolumne/endliche-einfache-gruppen-das-monster-und-der-laengste-beweis/2137146

**Finite Simple Groups**

* [Endliche einfache Gruppen](https://de.m.wikipedia.org/wiki/Endliche_einfache_Gruppe) gelten in der Gruppentheorie als die Bausteine der [endlichen Gruppen](https://de.m.wikipedia.org/wiki/Endliche_Gruppe).

* Die endlichen einfachen Gruppen spielen für die endlichen Gruppen eine ähnliche Rolle wie die Primzahlen für die natürlichen Zahlen: Jede endliche Gruppe lässt sich in ihre einfachen Gruppen „zerteilen“ (für die Art der Eindeutigkeit siehe den Satz von Jordan-Hölder).

* Die Rekonstruktion einer endlichen Gruppe aus diesen ihren „Faktoren“ ist aber nicht eindeutig.

* Es gibt jedoch keine „noch einfacheren Gruppen“, aus denen sich die endlichen einfachen Gruppen konstruieren lassen.

Obwohl die endlichen einfachen Gruppen seit 1982 als vollständig klassifiziert galten, schlossen Mathematiker um Aschbacher die Klassifikation erst im Jahre 2002 mit einem 1200 Seiten langen Beweis ab:

* Fast alle dieser Gruppen lassen sich einer von 18 Familien endlicher einfacher Gruppen zuordnen.

* Es existieren 26 Ausnahmen. Diese Gruppen werden als **sporadische Gruppen** bezeichnet (Zu den sporadischen Gruppen zählen die Conway-Gruppe, das Babymonster und die [**Monstergruppe**](
https://de.m.wikipedia.org/wiki/Monstergruppe) (mit fast 1054 Elementen die größte sporadische Gruppe).

* Die [sporadischen Gruppen](https://de.m.wikipedia.org/wiki/Sporadische_Gruppe) sind 26 spezielle Gruppen in der Gruppentheorie. Es handelt sich um die [endlichen einfachen Gruppen](https://de.m.wikipedia.org/wiki/Endliche_einfache_Gruppe), die sich nicht in eine der [(18) systematischen Familien mit unendlich vielen Mitgliedern](https://de.m.wikipedia.org/wiki/Endliche_einfache_Gruppe#Familien_endlicher_einfacher_Gruppen) (von endlichen einfachen Gruppen) einordnen lassen.

> https://www.quantamagazine.org/mathematicians-chase-moonshine-string-theory-connections-20150312/


*Klassifikation der endlichen einfachen Gruppe*

Die endlichen einfachen Gruppen [lassen sich einteilen in](https://de.m.wikipedia.org/wiki/Endliche_einfache_Gruppe#Klassifikation) bzw [Classification of finite simple groups](https://en.m.wikipedia.org/wiki/Classification_of_finite_simple_groups), Every finite simple group is isomorphic to one of the following groups:

* a member of one of three infinite classes of such, namely:

  * (1) [zyklische Gruppen](https://de.m.wikipedia.org/wiki/Zyklische_Gruppe) von Primzahlordnung,

  * (1) [alternierende Gruppen](https://de.m.wikipedia.org/wiki/Alternierende_Gruppe) $A_{n}$ mit $n>4$,

  * (16) [Gruppen vom Lie-Typ](https://de.m.wikipedia.org/wiki/Gruppe_vom_Lie-Typ) über einem [endlichen Körper](https://de.m.wikipedia.org/wiki/Endlicher_Körper) (16 jeweils unendliche Familien),

* (26) one of 26 groups called the "sporadic groups" / [26 sporadische Gruppen](https://de.m.wikipedia.org/wiki/Sporadische_Gruppe).

* (1) the [Tits group](https://en.m.wikipedia.org/wiki/Tits_group) (which is sometimes considered a 27th sporadic group)


[2004: Classification of Quasithin group](https://en.m.wikipedia.org/wiki/Quasithin_group)

*Algebraic Structures (Moduln, Ring, Field)*

![ggg](https://upload.wikimedia.org/wikipedia/commons/thumb/7/7a/%C3%9Cbersicht_K%C3%B6rper.svg/775px-%C3%9Cbersicht_K%C3%B6rper.svg.png)

**Ein [Modul](
https://de.m.wikipedia.org/wiki/Modul_(Mathematik)) ist ein n-dimensionaler Ring.**

* Ein Modul ist eine algebraische Struktur, die eine Verallgemeinerung eines Vektorraums darstellt.

* **A module is similar to a vector space, except that the scalars are only required to be elements of a ring. (Gilt NICHT multiplikative Inverse und multiplikative Kommuntativität)**

* For example, the set Zn of n-dimensional vectors with integer entries forms a module, where “scalar multiplication” refers to multiplication by integer scalars.

Folgende Zahlenbereiche sind additive Gruppen und damit $\mathbb {Z}$ -Moduln:

* die ganzen Zahlen $\mathbb {Z}$ selbst

* die rationalen Zahlen $\mathbb {Q}$

* die reellen Zahlen $\mathbb {R}$

* die algebraischen Zahlen $\mathbb A$ bzw. $\mathbb A$ $\cap$ $\mathbb R$

* die komplexen Zahlen $\mathbb {C}$


**Ein [Ring](https://en.m.wikipedia.org/wiki/Ring_theory) ist eine Menge R mit <u>zwei inneren binären Verknüpfungen</u> „+“ und „∙“, sodass gilt:**

1. **Addition: (R, +) ist eine abelsche Gruppe**

* Addition is associative and commutative;

* There is an additive identity, zero;

* Every element has an additive inverse;

2. **Multiplikation: (R, ∙) ist eine Halbgruppe**, das bedeutet:

* Halbgruppe in der Multiplikation im Ring: **nur nur die Assoziativität, aber keine Inverse, neutrales element oder kommutativität**)

> **Das bedeutet: -> Sowohl Ringe als auch Körper verlangen, dass bzgl. der Addition eine kommutative Gruppe vorliegt (abelsch!). Bei der Multiplikation erfolgt der Übergang vom Ring zum Körper durch die Verschärfung der Forderungen**

* Unlike a field, a ring is not required to have multiplicative inverses, and the multiplication is not required to be commutative.

> **A good example of a ring is the set of all n×n matrices under the operations of matrix addition and matrix multiplication.** [Matrix-Multiplication is non-commutative!](https://en.m.wikipedia.org/wiki/Matrix_multiplication#Non-commutativity)

$\mathbf{A B} \neq \mathbf{B A}$
For example

$
\left(\begin{array}{ll}
0 & 1 \\
0 & 0
\end{array}\right)\left(\begin{array}{ll}
0 & 0 \\
1 & 0
\end{array}\right)=\left(\begin{array}{ll}
1 & 0 \\
0 & 0
\end{array}\right)
$

but

$
\left(\begin{array}{ll}
0 & 0 \\
1 & 0
\end{array}\right)\left(\begin{array}{ll}
0 & 1 \\
0 & 0
\end{array}\right)=\left(\begin{array}{ll}
0 & 0 \\
0 & 1
\end{array}\right)
$

> **The integers Z also form a ring under the operations of addition and multiplication.**

3. **Die Distributivgesetze a*(b+c)=a*b+a*c und (a+b)*c = a*c+b*c sind für alle a,b,c ε $R$ erfüllt.**

4. **Das neutrale Element 0 von (R, +) heißt Nullelement von R.**

https://de.m.wikipedia.org/wiki/Ring_(Algebra)

**Ein Ring heißt kommutativ**, falls er bezüglich der Multiplikation kommutativ ist (Ein Ring heißt kommutativ, falls er bezüglich der Multiplikation kommutativ ist, ansonsten spricht man von einem nicht-kommutativen Ring.)

Beispiele:

* 2×2 Real matrices.

* Das wichtigste Beispiel eines Ringes sind die Integers / ist die Menge (􏰁$\mathbb Z$,+,∙) der ganzen Zahlen mit der üblichen Addition und Multiplikation. Es handelt sich dabei um einen nullteilerfreien kommutativen Ring mit Einselement, also einen Integritätsring.

* the Integers modulo some Natural number greater than one;

* Ebenso bildet ($\mathbb Q$,+,∙) der rationalen Zahlen mit der üblichen Addition und Multiplikation einen Ring. Da in diesem Fall nicht nur ($\mathbb Q$,+), sondern auch ($\mathbb Q$ \ {0},∙) eine abelsche Gruppe bildet, liegt sogar ein Körper vor; es handelt sich dabei um den Quotientenkörper des Integritätsringes (􏰁$\mathbb Z$,+,∙).

* Kein Ring ist die Menge ($\mathbb N$􏰀,+,∙) der natürlichen Zahlen mit der üblichen Addition und Multiplikation, da die Addition über den natürlichen Zahlen nicht invertierbar ist.

https://www.quora.com/What-are-the-differences-between-rings-and-fields

**Ein [Körper (Field)](https://de.m.wikipedia.org/wiki/K%C3%B6rper_(Algebra)) ist eine spezielle Form von Ring**:

> Man nennt die Elemente im Körper nicht Vektoren, sondern Skalare. Über dem Skalarkörper betrachtet man einen Vektorraum und dessen Elemente heißen Vektoren.

* **A Field is a Ring whose non-zero elements form a commutative Group under multiplication (In short a field is a commutative ring with unity with all its non zero elements having multiplicative inverse.)**

* Ein kommutativer unitärer Ring, der nicht der Nullring ist, heißt ein Körper, wenn in ihm jedes von Null verschiedene Element multiplikativ invertierbar ist.
Anders formuliert, ist ein Körper ein kommutativer unitärer Ring K, in dem die Einheitengruppe K* gleich K \ {0}, also maximal groß, ist.

* Ein kommutativer unitärer Ring, der nicht der Nullring ist, ist ein Körper, wenn in ihm jedes von Null verschiedene Element ein Inverses bezüglich der Multiplikation besitzt. Anders formuliert, ist ein Körper ein kommutativer unitärer Ring $K$, in dem die Einheitengruppe $K^{*}$ gleich $K \backslash\{0\}$ ist.


Ein Tripel (K,+,•), bestehend aus einer Menge K und zwei binären Verknüpfungen „+“ und „•“ (die üblicherweise Addition und Multiplikation genannt werden), ist genau dann ein Körper, wenn folgende Eigenschaften erfüllt sind:

* $(K,+)$ ist eine abelsche Gruppe (mit Neutralelement 0)

* $(K \backslash\{0\}, •)$ ist eine abelsche Gruppe (mit Neutralelement 1)

* $a \cdot(b+c)=a \cdot b+a \cdot c$ und $(a+b) \cdot c=a \cdot c+b \cdot c$ (Distributivgesetz)

Additive Eigenschaften:

* $a+(b+c)=(a+b)+c$ (Assoziativgesetz)

* $a+b=b+a$ (Kommutativgesetz)

* Es gibt ein Element $0 \in K$ mit $0+a=a$ (neutrales Element)

* Zu jedem $a \in K$ existiert das additive Inverse $(-a)$ mit $(-a)+a=0$

Multiplikative Eigenschaften:

* $\cdot a \cdot(b \cdot c)=(a \cdot b) \cdot c$ (Assoziativgesetz)

* $a \cdot b=b \cdot a$ (Kommutativgesetz)

* Es gibt ein Element $1 \in K$ mit $1 \cdot a=a$ (neutrales Element), und es ist $1 \neq 0$.

* Zu jedem $a \in K \backslash\{0\}$ existiert das multiplikative Inverse $a^{-1}$ mit $a^{-1} \cdot a=1$

Zusammenspiel von additiver und multiplikativer Struktur:

* $a \cdot(b+c)=a \cdot b+a \cdot c$ (Links-Distributivgesetz)

* Das Rechts-Distributivgesetz $(a+b) \cdot c=a \cdot c+b \cdot c$ folgt dann aus den übrigen Eigenschaften:
$(a+b) \cdot c=c \cdot(a+b)=c \cdot a+c \cdot b=a \cdot c+b \cdot c$

**Beispiele**

* The most familiar form of algebra is the elementary algebra that you learned in high school, namely the algebra of the real numbers. From an abstract point of view, this is the algebra of fields.

* Note that the axioms for a field are precisely the axioms for algebra on the real numbers. As a result, the real numbers R form a field under the usual operations of addition and multiplication. However, the real numbers are not the only possible field. Indeed, you are already familiar with a few other examples:

* set of rational numbers under addition and multiplication. The rational numbers Q form a field under the usual operations of addition and multiplication. In particular, we can add or multiply two elements of Q to obtain another element of Q, and these operations obey all of the axioms listed above.

* The complex numbers C form a field under the commonly defined operations of addition and multiplication. Complex numbers do obey all of the listed axioms for a field, which is why elementary algebra works as usual for complex numbers.

* The Integers modulo a Prime number.

*An example of a set of numbers that is **not a field** is the set of integers. It is an "integral domain." It is not a field because it lacks multiplicative inverses. Without multiplicative inverses, division may be impossible.*

* Both are algebraic objects with a notion of addition and multiplication, **but the multiplication in a field is more specialized**: it is necessarily commutative and every nonzero element has a multiplicative inverse.


**Siehe auch**

* [Algebraische Struktur](https://de.m.wikipedia.org/wiki/Algebraische_Struktur)

* [Outline_of_algebraic_structures](https://en.m.wikipedia.org/wiki/Outline_of_algebraic_structures)

* [Mathematische Strukturen](https://de.m.wikipedia.org/wiki/Mathematische_Struktur) (neben topologischen Strukturen, geometrischen Strukturen und Zahlbereichen)

* [Geometrische Gruppentheorie](https://de.wikipedia.org/wiki/Geometrische_Gruppentheorie) (Gruppenoperationen auf Graphen und metrischen Räumen, letztlich werden die Gruppen selbst zu solchen geometrischen Objekten)

* [Verknüpfungen](https://de.m.wikipedia.org/wiki/Verknüpfung_(Mathematik))

* [Gruppenoperation](https://de.wikipedia.org/wiki/Gruppenoperation)

Video: [Icosahedral Symmetry](https://youtu.be/B-WI8JZR140)

###### *Lie Group*

> Video: [Lie Groups](https://youtu.be/ZRca3Ggpy_g)

Lie Groups Based Machine Learning
- Lie Group Forced Variational Integrator Networks for Learning and Control of Robot Systems
https://lnkd.in/e-r-ChbX
- Structure preserving deep learning
https://lnkd.in/gPghU6x
- Lie Group Cohomology and (Multi)Symplectic Integrators: New Geometric Tools for Lie Group Machine Learning Based on Souriau Geometric Statistical Mechanics
https://lnkd.in/dCPaWBU

more information at SEE GSI'23:
https://gsi2023.org/


From: [QHack 2022: Marco Cerezo —Barren plateaus and overparametrization in quantum neural networks](https://www.youtube.com/watch?v=rErONNdHbjg)

![geometry](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_1401.png)

![geometry](https://raw.githubusercontent.com/deltorobarba/repo/master/sciences_1400.png)

**Lie Theory**

> “The essential phenomenon of Lie theory is that one may associate in a natural way to a Lie group $\mathcal{G}$ its Lie algebra $\mathfrak{g}$. The Lie algebra $\mathfrak{g}$ is first of all a vector space and secondly is endowed with a bilinear nonassociative product called the Lie bracket [...]. **Amazingly, the group $\mathcal{G}$ is almost completely determined by $\mathfrak{g}$ and its Lie bracket**. Thus for many purposes <font color="blue">**one can replace $\mathcal{G}$ with $\mathfrak{g}$. Since $\mathcal{G}$ is a complicated nonlinear object and $\mathfrak{g}$ is just a vector space, it is usually vastly simpler to work with $\mathfrak{g}$**</font>. [...] This is one source of the power of Lie theory." Stillwell: “the miracle of Lie theory”. (*https://arxiv.org/pdf/1812.01537.pdf*)

* Article about [Lie-Gruppe](https://de.m.wikipedia.org/wiki/Lie-Gruppe)

* See also [Lie algebra representation](https://en.m.wikipedia.org/wiki/Lie_algebra_representation) and [representation of a Lie group](https://en.m.wikipedia.org/wiki/Representation_of_a_Lie_group) (a linear action of a Lie group on a vector space). Representations play an important role in the study of continuous symmetry.

* See this very good paper: [A micro Lie theory for state estimation in robotics](https://arxiv.org/abs/1812.01537)

* Lie theory for the roboticist: https://www.youtube.com/watch?v=nHOcoIyJj2o&t=3147s

* https://github.com/artivis/manif/blob/devel/paper/Lie_theory_cheat_sheet.pdf

**Lie Group**

* A [Lie group](https://en.wikipedia.org/wiki/Lie_group) is a group that is also a **differentiable manifold**. A manifold is a space that locally resembles Euclidean space, whereas groups define the abstract, generic concept of **multiplication and the taking of inverses (division)**. Combining these two ideas, one obtains a continuous group where points can be multiplied together, and their inverse can be taken.

* Lie groups provide a natural model for the concept of **continuous symmetry**, a celebrated example of which is the rotational symmetry in three dimensions (given by the special orthogonal group ${\text{SO}}(3)$).

* Lie groups were first found by studying matrix subgroups
$G$ contained in ${\text{GL}}_{n}(\mathbb {R} )$ or ${\text{GL}}_{n}(\mathbb {C} )$, the groups of $n\times n$ invertible matrices over $\mathbb {R}$  or $\mathbb {C}$ (now called the classical groups).

* Lie's original motivation for introducing Lie groups was to model the continuous symmetries of differential equations, in much the same way that finite groups are used in Galois theory to model the discrete symmetries of algebraic equations.

**Examples of Lie groups**

> [Table of some common Lie groups and their associated Lie algebras](https://en.m.wikipedia.org/wiki/Table_of_Lie_groups)

![gg](https://raw.githubusercontent.com/deltorobarba/repo/master/liegroup_01.png)

* **Der Einheitskreis in der komplexen Zahlenebene**, d. h. die Menge $S^{1}=\{z \in \mathbb{C}:|z|=1\}$ der komplexen Zahlen vom Betrag 1, ist eine Untergruppe von $\left(\mathbb{C}^{*}, \cdot\right)$, die sogenannte **Kreisgruppe**: Das Produkt zweier Zahlen vom Betrag 1 hat wieder Betrag 1, ebenso das Inverse. Auch hier hat man eine $_{n}$ mit der Differentialrechnung verträgliche Gruppenstruktur", d. h. eine Lie-Gruppe.

* **Die Menge $\mathbb{C}^{*}=\mathbb{C} \backslash\{0\}$ der komplexen Zahlen ungleich 0 bildet mit der gewöhnlichen Multiplikation eine Gruppe $\left(\mathbb{C}^{*}, \cdot\right)$**. Die Multiplikation ist eine differenzierbare Abbildung $m: \mathbb{C}^{*} \times \mathbb{C}^{*} \rightarrow \mathbb{C}^{*}$ definiert durch $m(x, y)=x y_{i}$ auch die durch $i(z)=z^{-1}=\frac{1}{z}$ definierte Inversion $i: \mathbb{C}^{*} \rightarrow \mathbb{C}^{*}$ ist differenzierbar. Die Gruppenstruktur der komplexen Ebene (bzgl. Multiplikation) ist also mit der Differentialrechnung verträglich.

* [Beispiele fur Lie-Gruppen](https://de.wikipedia.org/wiki/Lie-Gruppe#Beispiele) sind:  allgemeine lineare Gruppe,  Orthogonale Gruppe,  Unitäre Gruppe & Spezielle Unitäre Gruppe,  Affine Gruppe, [Poincaré-Gruppe](https://en.m.wikipedia.org/wiki/Poincaré_group), Galilei-Gruppe

* **Simple Lie Groups**: a simple Lie group is a connected non-abelian Lie group G which does not have nontrivial connected normal subgroups. The list of simple Lie groups can be used to read off the list of simple Lie algebras and [Riemannian symmetric spaces](https://en.wikipedia.org/wiki/Symmetric_space). See also [Classification of semisimple Lie algebras](https://en.wikipedia.org/wiki/Dynkin_diagram#Classification_of_semisimple_Lie_algebras)


**Lie Algebra**

* a [Lie algebra](https://en.m.wikipedia.org/wiki/Lie_algebra) **is a vector space $g$ together with an operation called the Lie bracket**, an alternating bilinear map $\mathfrak{g} \times \mathfrak{g} \rightarrow \mathfrak{g},(x, y) \mapsto[x, y]$, that satisfies the Jacobi identity.

* The vector space $\mathfrak{g}$ together with this operation is a non-associative algebra, meaning that the Lie bracket is not necessarily associative.

> <font color="blue">**Any Lie group gives rise to a Lie algebra, which is its tangent space at the identity.**

* In physics, Lie groups appear as symmetry groups of physical systems, and their Lie algebras (tangent vectors near the identity) may be thought of as infinitesimal symmetry motions. Thus Lie algebras and their representations are used extensively in physics, notably in quantum mechanics and particle physics.

* [Lie Algebra](https://de.m.wikipedia.org/wiki/Lie-Algebra) ist eine algebraische Struktur, die mit einer Lie-Klammer versehen ist, d. h. es existiert eine antisymmetrische Verknüpfung, die die Jacobi-Identität erfüllt.

* Lie-Algebren werden hauptsächlich zum Studium geometrischer Objekte wie Lie-Gruppen und differenzierbarer Mannigfaltigkeiten eingesetzt.

* **Simple Lie Algebra**: a [simple Lie algebra](https://en.wikipedia.org/wiki/Simple_Lie_algebra) is a Lie algebra that is nonabelian and contains no nonzero proper ideals. The classification of real simple Lie algebras is one of major achievements of Wilhelm Killing and Élie Cartan. A direct sum of simple Lie algebras is called a semisimple Lie algebra. A simple Lie group is a connected Lie group whose Lie algebra is simple.

* **Semisimple Lie algebra**: a [Lie algebra is semisimple](https://en.wikipedia.org/wiki/Semisimple_Lie_algebra) if it is a direct sum of simple Lie algebras (non-abelian Lie algebras without any non-zero proper ideals).

**Lie-Algebra der Lie-Gruppe**

* Die Vektorfelder auf einer glatten Mannigfaltigkeit $M$ bilden mit der Lie-Klammer eine unendlich-dimensionale Lie-Algebra. Die zu einer Lie-Gruppe $G$ gehörende Lie-Algebra $\mathfrak{g}$ besteht aus dem Unterraum der [links-invarianten](https://de.m.wikipedia.org/wiki/Translationsinvarianz) Vektorfelder auf $G$.

* Dieser Vektorraum ist isomorph zum Tangentialraum $T_{e} G$ am neutralen Element $e$ von $G$. Insbesondere gilt also $\operatorname{dim} G=\operatorname{dim} \mathfrak{g}$. Bezüglich der LieKlammer $[\cdot, \cdot]$ ist der Vektorraum $\mathfrak{g}$ abgeschlossen.

* **Somit ist der Tangentialraum einer Lie-Gruppe $G$ am neutralen Element eine Lie-Algebra. Diese Lie-Algebra nennt man die Lie-Algebra der Lie-Gruppe $G$.**

* Zu jeder Lie-Gruppe $G$ mit Lie-Algebra $\mathfrak{g}$ gibt es eine **Exponentialabbildung exp (exponential map)**: $\mathfrak{g} \rightarrow G$. Diese Exponentialabbildung kann man definieren durch $\exp (A)=\Phi_{1}(e)$, wobei $\Phi_{t}$ der Fluss des links-invarianten Vektorfelds $A$ und $e \in G$ das neutrale Element ist. Falls $G$ eine abgeschlossene Untergruppe der $\mathrm{GL}(n, \mathbb{R})$ oder $\mathrm{GL}(n, \mathbb{C})$ ist, so ist die so definierte Exponentialabbildung identisch mit der Matrixexponentialfunktion.


* Jedes Skalarprodukt auf $T_{e} G=\mathfrak{g}$ definiert eine $G$ -links-invariante Riemannsche Metrik auf $G$. Im Spezialfall, dass diese Metrik zusätzlich auch
rechtsinvariant ist, stimmt die Exponentialabbildung der Riemannschen
Mannigfaltigkeit $G$ am Punkt $e$ mit der Lie-Gruppen-Exponentialabbildung
überein.

* Mit der Lie-Gruppe $\mathrm{SO}(n)$ ist eine Lie-Algebra $\mathfrak{s o}(n)$ verknüpft, ein Vektorraum mit einem bilinearen alternierenden Produkt (Lie-Klammer), wobei der Vektorraum bezüglich der Lie- Klammer abgeschlossen ist.

* Dieser Vektorraum ist isomorph zum Tangentialraum am neutralen Element der $\mathrm{SO}(n)$ (neutrales Element ist die Einheitsmatrix), sodass insbesondere $\operatorname{dim}_\mathfrak{s o}(n)=\operatorname{dim} \mathrm{SO}(n)$ gilt.

* Die Lie-Algebra besteht aus allen schiefsymmetrischen $n \times n$
-Matrizen und ihre Basis sind die sog. Erzeugenden Generators).

* Die Exponentialabbildung verknüpft die Lie-Algebra mit der Lie-Gruppe:

> $\exp : \mathfrak{s o}(n) \rightarrow \mathrm{SO}(n), J \mapsto \sum_{k=0}^{\infty} \frac{1}{k !} J^{k}$

Exponentiation in quantum: $\mathbb{G} = e^\mathfrak{g}$

**Why using Lie Algebra?**

* For example $\mathcal{G}$ is all the operations on the surface of a ball (nonlinear): Lie Groups are Continuous Transformation Groups, like a 3D rotation vector on a curved surface. Very complicated to work with. (Lie Algebra SO(3)): $
\boldsymbol{W}^{\wedge}=[\boldsymbol{\omega}]_{\times}=\left[\begin{array}{ccc}
0 & -\omega_{z} & \omega_{y} \\
\omega_{z} & 0 & -\omega_{x} \\
-\omega_{y} & \omega_{x} & 0
\end{array}\right]
$

* Meanwhile Lie Algebra $\mathfrak{g}$ is at the origin of the tangent plane which is a linear vector space: Cartesian R3: $
\omega=\left(\omega_{x}, \omega_{y}, \omega_{z}\right)
$

> **Tangent space at the origin is called the "Lie Algebra"** $\rightarrow$ Exponential map translates between both

**Lie Groups** were know as "Continuous Transformation Groups". **= a group that is also a smooth (differential) manifold**

  * **a smooth manifold whose elements satisfy the group axioms**

  * (so no singularities or breaks where differentiation or integration wouldn't work anymore)

  * each point on a manifold represents one element of the Lie Group (i.e. 3d rotation matrix on the manifold)

  * corresponding to it in the cartesian tangent space you can find a 3d rotation vector

**Lie Algebra** is the in origin point on manifold (https://www.youtube.com/watch?v=nHOcoIyJj2o&t=3147s)

**Exponential Map** from (cartesian) tangent space to manifold (i.e 3 D surface):

  * from tangent space a to manifold ('**exponential of a**'), like the exponential of a rotation vector (on tangent space) is the 3d rotation matrix (on manifold)

  * and back: point on manifold x and its '**logarithm of x**' back on the tangent space

  * it's an exact operation, and no approximation

  * explanation and proof here: https://www.youtube.com/watch?v=nHOcoIyJj2o&t=3147s


![gg](https://raw.githubusercontent.com/deltorobarba/repo/master/liegroup_07.png)

**Lie algebra is the origin point (= identity) on the tangent space!**

> **Tangent space at the origin is called the "Lie Algreba"** **each time you take a derivative on the manifold, you are out of the manifold**, but you can stay on the tangent space for doing this operation (because there they are well defined)

![gg](https://raw.githubusercontent.com/deltorobarba/repo/master/liegroup_08.png)

*https://arxiv.org/pdf/1812.01537.pdf*

* **antipodal point** to the origin: from origin point go half a turn around a 3d ball, you and up on the antipodal point, and there are many ways to go (any vector with length pie π)

* The tangent space will cover the manifold multiple times !!

  * "**First cover of the manifold by the tangent space**": all points on the tangent space will end up in the antipodal point when applying any vector with length pie π)

  * there you can see relationship between Lie group and tangent space on manifold

![gg](https://raw.githubusercontent.com/deltorobarba/repo/master/liegroup_02.png)

* In the following image: $R^{m}$ and $T_{E} M$ are the same vector space but with different representations

> **Lie Algebra $T_{E} M \sim R_{m} \text { (Cartesian Tangent Space) and } w \sim w^{\wedge}$**

* This means that $T_{E} M$ is isomorph to $R_{m}$

* Since you can always go to an $R_{m}$ space, any Lie Group will have an cartesian $R_{m}$ tangent space, which is a vector

* you can alweays go from one to the other give the isomorphism

![gg](https://raw.githubusercontent.com/deltorobarba/repo/master/liegroup_03.png)

**SO(3): Lie group of rotation matrices in 3D**

* you can write [w]<sub>x</sub> as a linear combination of 3 base matrices $E$<sub>x, y, z</sub>, which facilitates the calculation / it's easier than working directly with [w]<sub>x</sub>

* and since $T_{E} M \sim R_{m}$ sowie $w \sim w^{\wedge}$ (Isomorphism), it's also an allowed (exact) operation

* Let's write a tangent vector as a regular cartesian vector with 3 coordinates

> $[\omega]_{\times}=\omega_{x} \mathbf{E}_{x}+\omega_{y} \mathbf{E}_{y}+\omega_{z} \mathbf{E}_{z}$

* The matrix product $[\omega]_{\times}$ is equivalent to cross product of vectors $\omega =\omega_{x} \mathbf +\omega_{y} \mathbf +\omega_{z}$, isomorph, two ways of representing the same elemtn of the tangent space

* looking at the previous slide: cartesian R<sup>m</sup> is easier to work with than Lie Algebra T<sub>E</sub>M

![gg](https://raw.githubusercontent.com/deltorobarba/repo/master/liegroup_04.png)

![gg](https://raw.githubusercontent.com/deltorobarba/repo/master/liegroup_05.png)

**An elementary example is the space of three dimensional vectors $\mathfrak{g}=\mathbb{R}^{3}$** with the bracket operation defined by the cross product $[x, y]=x \times y$. This is skew-symmetric since $x \times y=-y \times x$, and instead of associativity it satisfies the Jacobi identity:

> $
x \times(y \times z)=(x \times y) \times z+y \times(x \times z)
$

* **This is the Lie algebra of the Lie group of rotations of space**, and each vector $v \in \mathbb{R}^{3}$ may be pictured as an infinitesimal rotation around the axis $v$, with velocity equal to the magnitude of $v$.

* The Lie bracket is a measure of the noncommutativity between two rotations: since a rotation commutes with itself, we have the alternating property $[x, x]=x \times x=0$.

###### *Category Theory*

[A monad is a monoid in the category of endofunctors](https://youtu.be/ENo_B8CZNRQ)

Category theory and group theory are both branches of abstract algebra, but they are quite different in focus and scope.

1. **Group Theory**: Group theory studies the algebraic structures known as groups, which are sets equipped with an operation that combines any two of its elements to form a third element in a way that satisfies four conditions: closure, associativity, identity, and invertibility. Group theory is a very concrete branch of abstract algebra, and its concepts are used throughout mathematics.

2. **Category Theory**: Category theory is a higher level of abstraction. In category theory, the emphasis is not on the individual objects (like groups, rings, vectors spaces, etc.) but rather on the morphisms or maps between these objects and the structure-preserving transformations that can be defined. Categories can be thought of as an abstract way to represent mathematical concepts and their relationships.

The relationship between these two branches of mathematics is that group theory can be viewed through the lens of category theory:

- A group itself can be viewed as a category. Specifically, it's a category with a single object in which every morphism (which represent the group elements) is invertible.

- Furthermore, the concept of a functor in category theory (a map between categories that preserves their structure) corresponds to the concept of a group homomorphism in group theory (a function between two groups that preserves the group operation).

So, category theory provides a unifying framework in which group theory and many other branches of mathematics can be studied. It's a way of looking at mathematical structures and their relationships at a higher level of abstraction.


[Category](https://en.m.wikipedia.org/wiki/Category_(mathematics)) and [Category theory](https://en.m.wikipedia.org/wiki/Category_theory)

* Eine Kategorie besteht aus Objekten und Morphismen. Man möchte jedem Objekt in einer Kategorie ein Objekt in der anderen Kategorie zuordnen, und das gleiche mit den Morphismen zwischen den Objekten.

* [Objects](https://ncatlab.org/nlab/show/object)

* [Product und Coproduct](https://de.m.wikipedia.org/wiki/Produkt_und_Koprodukt)

* [Anfangsobjekt (initiales), Endobjekt (terminales, finales) und Nullobjekt](https://de.m.wikipedia.org/wiki/Anfangsobjekt,_Endobjekt_und_Nullobjekt)

* [Functor](
https://ncatlab.org/nlab/show/functor): A homomorphism between categories is a functor. Zuordnung zwischen zwei Kategorien

  * Funktoren werden auch Diagramme genannt (mitunter nur in bestimmten Kontexten), da sie eine formale Abstraktion [kommutativer Diagramme](https://de.m.wikipedia.org/wiki/Kommutatives_Diagramm) darstellen.

  * functors must preserve [identity morphisms](https://en.m.wikipedia.org/wiki/Morphism#Definition) and [composition of morphisms](https://en.m.wikipedia.org/wiki/Function_composition)

* [Natural Transformations](https://en.m.wikipedia.org/wiki/Natural_transformation) are Maps between functors (and functors are maps between categories)

* [Duality](https://en.m.wikipedia.org/wiki/Dual_(category_theory)) is a correspondence between the properties of a category C and the dual properties of the opposite category C<sup>op</sup>.

* Topos: Category theorists have proposed [topos theory](https://en.m.wikipedia.org/wiki/Topos) as an alternative to traditional [axiomatic set theory](https://de.m.wikipedia.org/wiki/Axiomatische_Mengenlehre). Topos theory can interpret various alternatives to that theory, such as constructivism, finite set theory, and computable set theory.

> [Outline of category theory](https://en.m.wikipedia.org/wiki/Outline_of_category_theory)



**Category Theory and Higher Category Theory**

0) A category is a collection of objects and morphisms between those objects that satisfy some rules.

1) A functor is a morphism in the category of categories.

2) A natural transformation is a morphism in the category of functors.

But they all stop right there. What about:

3) the morphisms in the category of natural transformations?

4) Or the "morphisms in the category of the morphisms in the category of natural transformations"

*Definition*

* [higher category theory](https://en.m.wikipedia.org/wiki/Higher_category_theory) bzw. [higher category theory](https://ncatlab.org/nlab/show/higher+category+theory) is the part of category theory at a higher order, which means that some equalities are replaced by explicit arrows in order to be able to explicitly study the structure behind those equalities.

* Higher category theory is often applied in algebraic topology (especially in homotopy theory), where one studies algebraic invariants of spaces, such as their fundamental weak ∞-groupoid.

* *From 2-category to Higher Order Category*: The concept of 2-category generalizes further in higher category theory to n-categories, which have k-morphisms for all
k
≤
n
. The morphisms can be composed along the objects, while the 2-morphisms can be composed in two different directions: along objects – called horizontal composition – and along morphisms – called vertical composition. The composition of morphisms is allowed to be associative only up to coherent associator 2-morphisms.

* See also: [infinity-category](https://ncatlab.org/nlab/show/infinity-category)

*Morphismen (Linear Maps)*

Ein [Morphismus](https://de.m.wikipedia.org/wiki/Morphismus) ist eine Funktion in Kategorientheorie. Man schreibt: $f\colon X\to Y$. Image Source: [Morphismen](https://youtu.be/0wKsFNLR15g)

![xxx](https://raw.githubusercontent.com/deltorobarba/repo/master/morphismus2.jpg)

* [Automorphismus](https://de.m.wikipedia.org/wiki/Automorphismus): bijektiv, Definitionsmenge = Zielmenge

* [Isomorphismus](https://de.m.wikipedia.org/wiki/Isomorphismus): bijektiv, Definitionsmenge ≠ Zielmenge

* [Endomorphismus](https://de.m.wikipedia.org/wiki/Endomorphismus): Definitionsmenge = Zielmenge. aber Bildmenge umfasst nicht den ganzen Vektorraum (Ziele < Usprünge)

* [Monomorphismus](https://de.m.wikipedia.org/wiki/Monomorphismus): injektiv (jedes Element des Usprungs hat ein exklusives Element)

* [Epimorphismus](https://de.m.wikipedia.org/wiki/Epimorphismus): surjektiv (jedes Element im Ziel ist mind 1 mal getroffen)

**Deep Dive: Homomorphismus**

> Eine lineare Abbildung ist ein (Homo-)Morphismus zwischen Vektorräumen.

*Unterschied zwischen Isomorphismus und Homomorphismus:*

![xx](https://raw.githubusercontent.com/deltorobarba/repo/master/isomorphismus.JPG)

Ein [Vektorraum-Homomorphismus](https://de.m.wikipedia.org/wiki/Homomorphismus) ("Lineare Abbildung") ist eine Abbildung $\varphi: V \rightarrow W$ zwischen $K$ -Vektorräumen $V$ und $W$ (gemeinsamer Grundkörper $K$) mit den folgenden Eigenschaften:


1. **Additivität**: $\varphi(v+u)=\varphi(v)+\varphi(u)$ für alle $u$ und $v \in V$

2. **Homogenität**: $\varphi(\lambda \cdot v)=\lambda \cdot \varphi(v) \quad$ für alle $\lambda \in K$ und $v \in V$ (Skalarmultiplikation)

* Beispiel: Bei einer linearen Abbildung ist es unerheblich, ob man zwei Vektoren zuerst addiert und dann deren Summe abbildet oder zuerst die Vektoren abbildet und dann die Summe der Bilder bildet. Gleiches gilt für die Multiplikation mit einem Skalar aus dem Grundkörper.

* In der Funktionalanalysis, bei der Betrachtung unendlichdimensionaler Vektorräume, die eine Topologie tragen, spricht man meist von linearen Operatoren statt von [linearen Abbildungen](https://de.m.wikipedia.org/wiki/Lineare_Abbildung).


![xyz](https://upload.wikimedia.org/wikipedia/commons/thumb/3/33/Reflection_of_a_triangle_about_the_y_axis.svg/320px-Reflection_of_a_triangle_about_the_y_axis.svg.png)

*Achsenspiegelung als Beispiel einer linearen Abbildung*

**Beispiele für Vektorraumhomomorphismus**

* Für $V=W=\mathbb{R}$ hat jede lineare Abbildung die Gestalt $f(x)=m x$ mit $m \in \mathbb{R}$

* Es sei $V=\mathbb{R}^{n}$ und $W=\mathbb{R}^{m}$. Dann wird für jede $m \times n$ -Matrix $A$ mit Hilfe der Matrizenmultiplikation eine lineare Abbildung $f: \mathbb{R}^{n} \rightarrow \mathbb{R}^{m}$ durch

$f(x)=A x=\left(\begin{array}{ccc}a_{11} & \cdots & a_{1 n} \\ \vdots & & \vdots \\ a_{m 1} & \cdots & a_{m n}\end{array}\right)\left(\begin{array}{c}x_{1} \\ \vdots \\ x_{n}\end{array}\right)$
definiert.

Jede lineare Abbildung von $\mathbb{R}^{n}$ nach $\mathbb{R}^{m}$ kann so dargestellt werden.

Siehe auch: [Gruppenhomomorphismus](https://de.m.wikipedia.org/wiki/Gruppenhomomorphismus)

**Special Morphismen:**

Für manche Kategorien gibt es besondere Bezeichnungen für Morphismen.

* Ein [Homöomorphismus](https://de.m.wikipedia.org/wiki/Homöomorphismus) ist ein Isomorphismus zwischen topologischen Räumen. Sind (beispielsweise) die [Fundamentalgruppen](https://de.m.wikipedia.org/wiki/Fundamentalgruppe) zweier Räume isomorph, so sind die Räume homöomorph.

* Ein [Diffeomorphismus](https://de.m.wikipedia.org/wiki/Diffeomorphismus) ist ein Isomorphismus zwischen differenzierbaren Mannigfaltigkeiten.

* Eine [Isometrie](https://de.m.wikipedia.org/wiki/Isometrie) ist ein Isomorphismus in der Kategorie der metrischen Räumen mit den nichtexpansiven stetigen Abbildungen.