<a href="https://colab.research.google.com/github/ad17171717/YouTube-Tutorials/blob/main/Machine%20Learning%20with%20Python/Optical_Character_Recognition_(OCR)_with_Meta's_Nougat!.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Nougat**

**Nougat is an encoder-decoder transformer model that can parse through PDFs to extract text, LaTeX math and tables. Nougat is built using the Document Understanding Transformer (DONUT) architecture. The models uses a visual encoder that crops the image to a specified size and outputs a sequence of embedded patches. The encoded image is decoded into a sequence of tokens using a transformer decoder. The team at Meta trained Nougat on over a million articles from arXiv, PubMed Central and the Industry Documents Library.**

**Nougat outputs the information from PDFs into a Multimarkdown file output.**

**Nougat can be trained or fine-tuned on a sepcified data set.**

<sup>Source: [Nougat: Neural Optical Understanding for Academic Documents](https://github.com/facebookresearch/nougat) GitHub Repository</sup>

<sup>Source: [Nougat: Neural Optical Understanding for Academic Documents](https://arxiv.org/abs/2308.13418) Paper on arXiv</sup>

In [1]:
from IPython import display
import os

In [2]:
!pip install git+https://github.com/facebookresearch/nougat
display.clear_output()

In [3]:
!nougat -h

usage: nougat
       [-h]
       [--batchsize BATCHSIZE]
       [--checkpoint CHECKPOINT]
       [--out OUT]
       [--recompute]
       [--markdown]
       pdf
       [pdf ...]

positional arguments:
  pdf
    PDF(s) to
    process.

options:
  -h, --help
    show this
    help
    message and
    exit
  --batchsize BATCHSIZE, -b BATCHSIZE
    Batch size
    to use.
  --checkpoint CHECKPOINT, -c CHECKPOINT
    Path to
    checkpoint
    directory.
  --out OUT, -o OUT
    Output
    directory.
  --recompute
    Recompute
    already
    computed
    PDF,
    discarding
    previous pr
    edictions.
  --markdown
    Add postpro
    cessing
    step for
    markdown co
    mpatibility
    .


## **Converting a Native PDF File**

In [4]:
!curl -o quantum_physics.pdf https://www.sydney.edu.au/science/chemistry/~mjtj/CHEM3117/Resources/postulates.pdf

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 69838  100 69838    0     0  30036      0  0:00:02  0:00:02 --:--:-- 30037


<sup>Source: [The Postulates of Quantum Mechanics](https://www.sydney.edu.au/science/chemistry/~mjtj/CHEM3117/Resources/postulates.pdf) from the University of Sydney</sup>

In [5]:
!nougat --markdown pdf '/content/quantum_physics.pdf' --out 'physics'

downloading nougat checkpoint version 0.1.0-small to path /root/.cache/torch/hub/nougat
config.json: 100% 557/557 [00:00<00:00, 3.36Mb/s]
pytorch_model.bin: 100% 956M/956M [00:04<00:00, 209Mb/s]
special_tokens_map.json: 100% 96.0/96.0 [00:00<00:00, 470kb/s]
tokenizer.json: 100% 2.04M/2.04M [00:00<00:00, 9.40Mb/s]
tokenizer_config.json: 100% 106/106 [00:00<00:00, 521kb/s]
INFO:root:Output directory does not exist. Creating output directory.
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
  0% 0/1 [00:00<?, ?it/s][nltk_data] Downloading package words to /root/nltk_data...
[nltk_data]   Unzipping corpora/words.zip.
INFO:root:Processing file /content/quantum_physics.pdf with 2 pages
100% 1/1 [00:17<00:00, 17.78s/it]


In [6]:
display.Markdown('/content/physics/quantum_physics.mmd')

## The Postulates of Quantum Mechanics

There are six postulates of quantum mechanics.

### Postulate 1

The state of a quantum mechanical system is completely specified by the function \(\Psi({\bf r},t)\) that depends on the coordinates of the particle, \({\bf r}\) and the time, \(t\). This function is called the wavefunction or state function and has the property that \(\Psi^{*}({\bf r},t)\Psi({\bf r},t)d\tau\) is the probability that the particle lies in the volume element \(d\tau\) located at \({\bf r}\) and time \(t\).

This is the _probabalistic_ interpretation of the wavefunction. As a result the wavefunction must satisfy the condition that finding the particle _somewhere_ in space is 1 and this gives us the normalisation condition,

\[\int_{-\infty}^{+\infty}\Psi^{*}({\bf r},t)\Psi({\bf r},t)d\tau=1\]

The other conditions on the wavefunction that arise from the probabilistic interpretation are that it must be single-valued, continuous and finite. We normally write wavefunctions with a normalisation constant included.

### Postulate 2

To every observable in classical mechanics there corresponds a linear, Hermitian operator in quantum mechanics.

This postulate comes from the observation that the expectation value of an operator that corresponds to an observable must be real and therefore the operator must be Hermitian. Some examples of Hermitian operators are:

\begin{tabular}{l l l l}
**Observable** & **Classical Symbol** & **Quantum Operator** & **Operation** \\ position & \({\bf r}\) & \(\hat{r}\) & multiply by \({\bf r}\) \\ momentum & \({\bf p}\) & \(\hat{p}\) & \(-i\hbar(\frac{\partial}{\partial x}+\hat{j}\frac{\partial}{\partial y}+\hat{k} \frac{\partial}{\partial z})\) \\ kinetic energy & \(T\) & \(\hat{T}\) & \(\frac{-\hbar^{2}}{2m}(\frac{\partial^{2}}{\partial x^{2}}+\frac{\partial^{2}} {\partial y^{2}}+\frac{\partial^{2}}{\partial z^{2}})\) \\ potential energy & \(V({\bf r})\) & \(\hat{V}({\bf r})\) & multiply by \(V({\bf r})\) \\ total energy & \(E\) & \({\cal H}\) & \(\frac{-\hbar^{2}}{2m}(\frac{\partial^{2}}{\partial x^{2}}+\frac{\partial^{2}} {\partial y^{2}}+\frac{\partial^{2}}{\partial z^{2}})+V({\bf r})\) \\ angular momentum & \(l_{x}\) & \(\hat{l}_{x}\) & \(-i\hbar(\frac{\partial}{\partial z}-z\frac{\partial}{\partial y})\) \\  & \(l_{y}\) & \(\hat{l}_{y}\) & \(-i\hbar(x\frac{\partial}{\partial x}-x\frac{\partial}{\partial z})\) \\  & \(l_{z}\) & \(\hat{l}_{z}\) & \(-i\hbar(x\frac{\partial}{\partial y}-y\frac{\partial}{\partial x})\) \\ \end{tabular}

### Postulate 3

In any measurement of the observable associated with operator \(\hat{A}\), the only values that will ever be observed are the eigenvalues, \(a\), that satisfy the eigenvalue equation,

\[\hat{A}\Psi=a\Psi\]This is the postulate that the values of dynamical variables are quantized in quantum mechanics (although it is possible to have a continuum of eigenvalues in the case of unbound states). If the system is in an eigenstate of \(\hat{A}\) with eigenvalue \(a\) then any measurement of the quantity \(A\) will always yield the value \(a\).

Although measurement will always yield a value, the initial state does not have to be an eigenstate of \(\hat{A}\). An arbitrary state can be expanded in the complete set of eigenvectors of \(\hat{A}\), \(\hat{A}\Psi_{i}=a_{i}\Psi_{i}\), as

\[\Psi=\sum_{i}^{n}c_{i}\Psi_{i}\]

where \(n\) may go to infinity. In this case measurement of \(A\) will yield _one_ of the eigenvalues, \(a_{i}\), but we don't know which one. The _probability_ of observing the eigenvalue \(a_{i}\) is given by the absolute value of the square of the coefficient, \(|\!|c_{i}|\!|^{2}\). The third postulate also implies that, after the measurement of \(\Psi\) yields some value, \(a_{i}\), the wavefunction _collapses_ into the eigenstate, \(\Psi_{i}\) that corresponds to \(a_{i}\). If \(a_{i}\) is degenerate \(\Psi\) collapses onto the degenerate subspace. Thus the act of measurement affects the state of the system and this has been used in many elegant experimental explorations of quantum mechanics (eg Bell's theorem).

## Postulate 4

If a system is in a state described by the normalised wavefunction, \(\Psi\), then the average value of the observable corresponding to \(\hat{A}\) is given by

\[<\hat{A}>=\int_{-\infty}^{+\infty}\Psi^{*}\hat{A}\Psi d\tau\]

## Postulate 5

The wavefunction or state function of a system evolves in time according to the time-dependent Schrodinger equation

\[{\cal H}\Psi({\bf r},t)=i\hbar\frac{\partial\Psi}{\partial t}\]

## Postulate 6

The total wavefunction must be antisymmetric with respect to the interchange of all coordinates of one fermion with those of another. Electronic spin must be included in this set of coordinates.

The Pauli exclusion principle is a direct result of this _antisymmetry_ postulate.

## **Converting a Scanned PDF**

In [7]:
!curl -o fundamental_quantum_equations.pdf https://www.informationphilosopher.com/solutions/scientists/dirac/Fund_QM_1925.pdf

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 1196k  100 1196k    0     0  3618k      0 --:--:-- --:--:-- --:--:-- 3627k


<sup>Source: [The Fundamental Equations of Quantum Mechanics](https://www.informationphilosopher.com/solutions/scientists/dirac/Fund_QM_1925.pdf) by Paul Dirac from the Proceedings of the Royal Society of London</sup>

In [8]:
!nougat --markdown pdf '/content/fundamental_quantum_equations.pdf' --out 'physics'

  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
INFO: likely hallucinated title at the end of the page: ## 2 x 2
INFO:root:Processing file /content/fundamental_quantum_equations.pdf with 13 pages
100% 4/4 [01:03<00:00, 15.87s/it]


In [9]:
display.Markdown('/content/physics/fundamental_quantum_equations.mmd')

THE

ROYAL

SOCIETY

Publishing

The Fundamental Equations of Quantum Mechanics

Author(s): P. A. M. Dirac

Source: _Proceedings of the Royal Society of London. Series A, Containing Papers of a Mathematical and Physical Character_, Vol. 109, No. 752 (Dec. 1, 1925), pp. 642-653

Published by: Royal Society

Stable URL: [http://www.jstor.org/stable/9441](http://www.jstor.org/stable/9441)

Accessed: 18-12-2017 18:24 UTC

JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact support@jstor.org.

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at [http://about.jstor.org/terms](http://about.jstor.org/terms)

Royal Society is collaborating with JSTOR to digitize, preserve and extend access to _Proceedings of the Royal Society of London. Series A, Containing Papers of a Mathematical and Physical Character_

JSTOR_The Fundamental Equations of Quantum Mechanics._

By P. A. M. Dirac, 1851 Exhibition Senior Research Student, St. John's College, Cambridge.

(Communicated by R. H. Fowler, F.R.S.--Received November 7th, 1925.)

## 1 Introduction.

It is well known that the experimental facts of atomic physics necessitate a departure from the classical theory of electrodynamics in the description of atomic phenomena. This departure takes the form, in Bohr's theory, of the special assumptions of the existence of stationary states of an atom, in which it does not radiate, and of certain rules, called quantum conditions, which fix the stationary states and the frequencies of the radiation emitted during transitions between them. These assumptions are quite foreign to the classical theory, but have been very successful in the interpretation of a restricted region of atomic phenomena. The only way in which the classical theory is used is through the assumption that the classical laws hold for the description of the motion in the stationary states, although they fail completely during transitions, and the assumption, called the Correspondence Principle, that the classical theory gives the right results in the limiting case when the action per cycle of the system is large compared to Planck's constant \(h\), and in certain other special cases.

In a recent paper* Heisenberg puts forward a new theory, which suggests that it is not the equations of classical mechanics that are in any way at fault, but that the mathematical operations by which physical results are deduced from them require modification. _All_ the information supplied by the classical theory can thus be made use of in the new theory.

## 2 Quantum Algebra.

Consider a multiply periodic non-degenerate dynamical system of \(u\) degrees of freedom, defined by equations connecting the co-ordinates and their time differential coefficients. We may solve the problem on the classical theory in the following way. Assume that each of the co-ordinates \(x\) can be expanded in the form of a multiple Fourier series in the time \(t\), thus,

\[x=\Sigma_{a_{1}...a_{n}}x\left(\alpha_{1}\alpha_{2}\...\ \alpha_{u}\right)\,\exp.\ i\left(\alpha_{1} \alpha_{1}\dashv\alpha_{2}\omega_{2}\dashv...\dashv\alpha_{u}\omega_{u} \right)t\]

\[=\Sigma_{a}x_{a}\,\exp.\ i\left(\alpha\alpha\right)t,\]

* Heisenberg, 'Zeits. f. Phys.,' vol. 33, p. 879 (1925).

say, for brevity. Substitute these values in the equations of motion, and equate the coefficients on either side of each harmonic term. The equations obtained in this way (which we shall call the A equations) will determine each of the amplitudes \(x_{\mathsf{a}}\) and frequencies (\(\varkappa\omega\)), (the frequencies being measured in radians per unit time). The solution will not be unique. There will be a \(u\)-fold infinity of solutions, which may be labelled by taking the amplitudes and frequencies to be functions of \(u\) constants \(\kappa_{1}\,...\,\kappa_{\mathsf{u}}\). Each \(x_{\mathsf{a}}\) and (\(\varkappa\omega\)) is now a function of two sets of numbers, the \(\mathsf{a}\)'s and the \(\kappa\)'s, and may be written \(x_{\mathsf{a}\mathsf{e}},(\varkappa\omega)_{\mathsf{a}}\).

In the quantum solution of the problem, according to Heisenberg, we still assume that each co-ordinate can be represented by harmonic components of the form \(\exp.\,i\omega t\), the amplitude and frequency of each depending on two sets of numbers \(n_{1}\,...\,n_{\mathsf{u}}\) and \(m_{1}\,...\,m_{\mathsf{u}}\), in this case all integers, and being written \(x\,(nm)\), \(\omega\,(nm)\). The differences \(n_{\mathsf{r}}\,-\,m_{\mathsf{r}}\) correspond to the previous \(\varkappa_{\mathsf{r}}\), but neither the \(n\)'s nor any functions of the \(n\)'s and \(m\)'s play the part of the previous \(\kappa\)'s in pointing out to which solution each particular harmonic component belongs. We cannot, for instance, take together all the components for which the \(n\)'s have a given set of values, and say that these by themselves form a single complete solution of the equations of motion. The quantum solutions are all interlocked, and must be considered as a single whole. The effect of this mathematically is that, while on the classical theory each of the A equations is a relation between amplitudes and frequencies having one particular set of \(\kappa\)'s, the amplitudes and frequencies occurring in a quantum A equation do not have one particular set of values for the \(n\)'s, or for any functions of the \(n\)'s and \(m\)'s, but have their \(n\)'s and \(m\)'s related in a special way, which will appear later.

On the classical theory we have the obvious relation

\[(\varkappa\omega)_{\mathsf{a}}+(\beta\omega)_{\mathsf{a}}=(\alpha+\beta,\ \omega)_{\mathsf{a}}.\]

Following Heisenberg, we assume that the corresponding relation on the quantum theory is

\[\omega\,(n,\,n\,-\,\alpha)+\omega\,(n\,-\,\alpha,\,n\,-\,\alpha-\beta)=\omega \,(n,\,n\,-\,\alpha-\,\beta)\]

or

\[\omega\,(nm)+\omega\,(mk)=\omega\,(nk). \tag{1}\]

This means that \(\omega\,(nm)\) is of the form \(\Omega\,(n)\,-\,\Omega\,(m)\), the \(\Omega\)'s being frequency levels. On Bohr's theory these would be \(2\pi/h\) times the energy levels, but we do not need to assume this.

[MISSING_PAGE_EMPTY:4]

former by \(x\) in front and divide by \(x\) behind we get the latter. In a similar way the square root of \(x\) may be defined by

\[\sqrt[x]{x}\cdot\sqrt[x]{x}=x. \tag{4}\]

It is not obvious that there always should be solutions to (3) and (4). In particular, one may have to introduce sub-harmonics, _i.e._, new intermediate frequency levels, in order to express \(\sqrt[x]{x}\). One may evade these difficulties by rationalising and multiplying up each equation before interpreting it on the quantum theory and obtaining the A equations from it.

We are now able to take over each of the equations of motion of the system into the quantum theory provided we can decide the correct order of the quantities in each of the products. Any equation deducible from the equations of motion by algebraic processes not involving the interchange of the factors of a product, and by differentiation and integration with respect to \(t\), may also be taken over into the quantum theory. In particular, the energy equation may be thus taken over.

The equations of motion do not suffice to solve the quantum problem. On the classical theory the equations of motion do not determine the \(x_{\infty}\), \((\alpha\omega)_{\kappa}\) as functions of the \(\kappa\)'s until we assume something about the \(\kappa\)'s which serves to define them. We could, if we liked, complete the solution by choosing the \(\kappa\)'s such that \(\partial\mbox{E}/\partial\kappa_{r}=\omega_{r}/2\pi\), where E is the energy of the system, which would make the \(\kappa_{r}\) equal the action variables \(\mbox{J}_{r}\). There must be corresponding equations on the quantum theory, and these constitute the quantum conditions.

## 3 Quantum Differentiation

Up to the present the only differentiation that we have considered on the quantum theory is that with respect to the time \(t\). We shall now determine the form of the most general quantum operation \(d/dv\) that satisfies the laws

\[\frac{d}{dv}\;(x+y)=\frac{d}{dv}\;x+\frac{d}{dv}\,y,\] (I)

and

\[\frac{d}{dv}\,(xy)=\frac{d}{dv}\;x\cdot y+x\cdot\frac{d}{dv}\,y.\] (II)

(Note that the order of \(x\) and \(y\) is preserved in the last equation.)

The first of these laws requires that the amplitudes of the components of \(dx/dv\) shall be linear functions of those of \(x\), _i.e._,

\[dx/dv\;(nm)=\Sigma_{nm}\;\;a\;(nm\;;\;n^{\prime}m^{\prime})\;x\;(n^{\prime}m^ {\prime}). \tag{5}\]There is one coefficient \(a\) (\(nm\;;\,n^{\prime}m^{\prime}\)) for any four sets of integral values for the \(n\)'s, \(m\)'s, \(n\)'s and \(m^{\prime}\)'s. The second law imposes conditions on the \(a\)'s. Substitute for the differential coefficients in \(\Pi\) their values according to (5) and equate the (\(nm\)) components on either side. The result is

\[\Sigma_{\mbox{\tiny{\tt{nm}}}\mbox{\tiny{\tt{$\times$}}}}\;a\,(nm\;; \,n^{\prime}m^{\prime})\;x\;(n^{\prime}k)\;y\;(km^{\prime})=\Sigma_{\mbox{\tiny {\tt{km}}}\mbox{\tiny{\tt{$\times$}}}}\;a\,(nk\;;\;n^{\prime}k^{\prime})\;x\;(n^ {\prime}k^{\prime})\;y\;(km)\\ +\Sigma_{\mbox{\tiny{\tt{kk^{\prime}}}}}\;x\;(nk)\;a\,(km\;;\;k^{ \prime}m^{\prime})\;y\;(k^{\prime}m^{\prime}).\]

This must be true for all values of the amplitudes of \(x\) and \(y\), so that we can equate the coefficients of \(x\;(n^{\prime}k)\;y\;(k^{\prime}m^{\prime})\) on either side. Using the symbol \(\delta_{\mbox{\tiny{\tt{mm}}}}\) to have the value unity when \(m=n\) (\(i.e.\), when each \(m_{r}=n_{r}\)) and zero when \(m\neq n\), we get

\[\delta_{\mbox{\tiny{\tt{kk^{\prime}}}}}\;a\,(nm\;;\;n^{\prime}m^{\prime})= \delta_{\mbox{\tiny{\tt{mm}}}}\;a\,(nk^{\prime}\;;\;n^{\prime}k)+\,\delta_{ \mbox{\tiny{\tt{nn^{\prime}}}}}\;a\,(km\;;\;k^{\prime}m^{\prime}).\]

To proceed further, we have to consider separately the various cases of equality and inequality between the \(kk^{\prime}\), \(nm^{\prime}\) and \(nn^{\prime}\).

Take first the case when \(k=k^{\prime}\), \(m\neq m^{\prime}\), \(n\neq n^{\prime}\). This gives

\[a\;(nm\;;\;n^{\prime}m^{\prime})\;=\;0.\]

Hence all the \(a\,(nm\;;\;n^{\prime}m^{\prime})\) vanish except those for which either \(n=n^{\prime}\) or \(m=m^{\prime}\) (or both). The cases \(k\neq k^{\prime}\), \(m=m^{\prime}\), \(n\neq n^{\prime}\) and \(k\neq k^{\prime}\), \(m\neq m^{\prime}\), \(n=n^{\prime}\) do not give us anything new. Now take the case \(k=k^{\prime}\), \(m=m^{\prime}\), \(n\neq n^{\prime}\). This gives

\[a\;(nm\;;\;n^{\prime}m)\;=\;a\,(nk\;;\;n^{\prime}k).\]

Hence \(a\,(nm\;;\;n^{\prime}m)\) is independent of \(m\) provided \(n\neq n^{\prime}\). Similarly, the case \(k=k^{\prime}\), \(m\neq m^{\prime}\), \(n=n^{\prime}\) tells us that \(a\;(nm\;;\;nm^{\prime})\) is independent of \(n\) provided \(m\neq m^{\prime}\). The case \(k\neq k^{\prime}\), \(m=m^{\prime}\), \(n=n^{\prime}\) now gives

\[a\;(nk^{\prime}\;;\;nk)+\,a\;(km\;;\;k^{\prime}m)\;=\;0.\]

We can sum up these results by putting

\[a\,(nk^{\prime}\;;\;nk)=a\,(kk^{\prime})==a\;(km\;;\;k^{\prime}m), \tag{6}\]

provided \(k\neq k^{\prime}\). The two-index symbol \(a\,(kk^{\prime})\) depends, of course, only on the two sets of integers \(k\) and \(k^{\prime}\). The only remaining case is \(k=k^{\prime}\), \(m=m^{\prime}\), \(n=n^{\prime}\), which gives

\[a\;(nm\;;\;nm)=a\;(nk\;;\;nk)+\,a\;(km\;;\;km).\]

This means we can put

\[a\;(nm\;;\;nm)=a\;(mn)-a\;(nn). \tag{7}\]

Equation (7) completes equation (6) by defining \(a\,(kk^{\prime})\) when \(k=k^{\prime}\).

Equation (5) now reduces to

\[dx/dv\left(nm\right) =\Sigma_{m^{\prime}\,\neq\,m}\,a\left(nm\,;\,nm^{\prime}\right)x \left(nm^{\prime}\right)\,+\,\Sigma_{m^{\prime}\,\neq\,n}\,a\left(nm\,;\,n^{ \prime}m\right)x\left(n^{\prime}m\right)\] \[+a\left(nm\,;\,nm\right)x\left(nm\right)\] \[=\Sigma_{m^{\prime}\,\neq\,m}\,a\left(m^{\prime}m\right)x\left(nm^ {\prime}\right)-\,\Sigma_{n^{\prime}\,\neq\,n}\,a\left(nm^{\prime}\right)x \left(n^{\prime}m\right)\] \[+\{a\left(mm\right)\,\to a\left(nn\right)\}x\left(nm\right)\] \[=\Sigma_{k}\,\{x\left(nk\right)a\left(km\right)\,\to a\left(nk \right)x\left(km\right)\}.\]

Hence

\[dx/dv=xa-ax. \tag{8}\]

Thus the most general operation satisfying the laws I and II that one can perform upon a quantum variable is that of taking the difference of its Heisenberg products with some other quantum variable. It is easily seen that one cannot in general change the order of differentiations, _i.e._,

\[\frac{d^{2}x}{du\,dv}\neq\frac{d^{2}x}{dv\,du}.\]

As an example in quantum differentiation we may take the case when \(\left(a\right)\) is a constant, so that \(a\left(nm\right)=0\) except when \(n=m\). We get

\[dx/dv\left(nm\right)=x\left(nm\right)a\left(mm\right)-a\left(nn\right)x\left( nm\right).\]

In particular, if \(ia\left(nm\right)=\Omega\left(m\right)\), the frequency level previously introduced, we have

\[dx/dv\left(nm\right)=i\omega\left(nm\right)x\left(nm\right)\]

and our differentiation with respect to \(v\) becomes ordinary differentiation with respect to \(t\).

## 4 The Quantum Conditions

We shall now consider to what the expression \(\left(xy-yx\right)\) corresponds on the classical theory. To do this we suppose that \(x\left(n,n-\alpha\right)\) varies only slowly with the \(n\)'\(s\), the \(n\)'\(s\) being large numbers and the \(\alpha\)'s small ones, so that we can put

\[x\left(n,\,n-\alpha\right)=x_{\alpha\alpha}\]

where \(\kappa_{r}=n_{r}h\) or \(\left(n_{r}+\alpha_{r}\right)h\), these being practically equivalent. We now have \(x\left(n,\,n-\alpha\right)\,y\left(n-\alpha,\,n-\alpha-\beta\right)-y\left(n, \,n-\beta\right)x\left(n-\beta,\,n-\alpha-\beta\right)\)

\[=\{x\left(n,\,n-\alpha\right)-x\left(n-\beta,\,n-\beta-\alpha \right)\}\,y\left(n-\alpha,\,n-\alpha-\beta\right)\] \[-\{y\left(n,\,n-\beta\right)-y\left(n-\alpha,\,n-\alpha-\beta \right)\}\,x\left(n-\beta,\,n-\alpha-\beta\right).\] \[=h\Sigma_{r}\left\{\beta_{r}\,\frac{\partial x_{\alpha\alpha}}{ \partial\kappa_{r}}\,y_{\beta\alpha}-\alpha_{r}\frac{\partial y_{\alpha\alpha }}{\partial\kappa_{r}}x_{\alpha\alpha}\right\}. \tag{9}\]Now

\[2\pi i\beta_{p}y_{\beta}\exp.i\left(\beta\omega\right)\ell=\frac{\partial}{ \partial w_{r}}\left(y_{\beta}\exp.i\left(\beta\omega\right)t\right.\]

where the \(w_{r}\) are the angle variables, equal to \(\omega_{r}\!/2\pi\). Hence the \((nm)\) component of \((xy-yx)\) corresponds on the classical theory to

\[\frac{i\hbar}{2\pi}\Sigma_{a+\beta-n-m}\Sigma_{r}\biggl{\{} \frac{\partial}{\partial\kappa_{r}}\left\{x_{a}\exp.i\left(\alpha\omega\right)t \right\}\frac{\partial}{\partial w_{r}}\left\{y_{\beta}\exp.i\left(\beta\omega \right)t\right\}\] \[-\frac{\partial}{\partial\kappa_{r}}\left\{y_{\beta}\exp.i\left( \beta\omega\right)t\right\}\frac{\partial}{\partial w_{r}}\left\{x_{a}\exp.i \left(\alpha\omega\right)t\right\}\biggr{\}}\]

or \((xy-yx)\) itself corresponds to

\[-\frac{i\hbar}{2\pi}\Sigma_{r}\left\{\frac{\partial x}{\partial\kappa_{r}} \frac{\partial y}{\partial w_{r}}-\frac{\partial y}{\partial\kappa_{r}}\frac {\partial x}{\partial w_{r}}\right\}.\]

If we make the \(\kappa_{r}\) equal the action variables \(\mathrm{J}_{r}\), this becomes \(i\hbar/2\pi\) times the Poisson (or Jacobi) bracket expression

\[[x,y]=\Sigma_{r}\left\{\frac{\partial x}{\partial w_{r}}\frac{\partial y}{ \partial\mathrm{J}_{r}}-\frac{\partial y}{\partial w_{r}}\frac{\partial x}{ \partial\mathrm{J}_{r}}\right\}=\Sigma_{r}\left\{\frac{\partial x}{\partial _{r}}\frac{\partial y}{\partial\mathrm{J}_{r}}-\frac{\partial y}{\partial_{r} }\frac{\partial x}{\partial\mathrm{J}_{r}}\right\}\]

where the \(p\)'s and \(q\)'s are any set of canonical variables of the system.

The elementary Poisson bracket expressions for various combinations of the \(p\)'s and \(q\)'s are

\[\begin{array}{ccc}[q_{r},q_{s}]=0,&[p_{r},p_{s}]=0,&\\ &\\ &=\,\delta_{rs}=0&(r\neq s)\\ &=1.&(r=s)\end{array} \tag{10}\]

The general bracket expressions satisfy the laws I and II, which now read

\[[x,z]+[y,z]=[x+y,z],\]

\[[xy,z]=[x,z]\,y+x\,[y,z].\]

By means of these laws, together with \([x,y]=-[y,x]\), if \(x\) and \(y\) are given as algebraic functions of the \(p\), and \(q_{r}\), \([x,y]\) can be expressed in terms of the \([q_{r},q_{s}]\), \([p_{r},p_{s}]\) and \([q_{r},p_{s}]\), and thus evaluated, without using the commutative law of multiplication (except in so far as it is used implicitly on account of the proof of IIA requiring it). The bracket expression \([x,y]\) thus has a meaning on the quantum theory when \(x\) and \(y\) are quantum variables, if we take the elementary bracket expressions to be still given by (10).

We make the fundamental assumption that _the difference between the Heisenberg products of two quantum quantities is equal to \(i\hbar/2\pi\) times their Poisson bracket expression_. In symbols,

\[xy-yx=i\hbar/2\pi\cdot[x,y]. \tag{11}\]We have seen that this is equivalent, in the limiting case of the classical theory, to taking the arbitrary quantities \(\kappa_{r}\) that label a solution equal to the J\({}_{r}\), and it seems reasonable to take (11) as constituting the general quantum conditions.

It is not obvious that all the information supplied by equation (11) is consistent. Owing to the fact that the quantities on either side of (11) satisfy the same laws I and II or Ia and II\({}_{A}\), the only independent conditions given by (11) are those for which \(x\) and \(y\) are \(p\)'s or \(q\)'s, namely

\[\left.\begin{array}{c}q_{\alpha}q_{\alpha}-q_{\alpha}q_{\alpha}=0\\ p_{\alpha}p_{\alpha}-p_{\alpha}p_{\alpha}=0\\ q_{\alpha}p_{\alpha}-p_{\alpha}q_{\alpha}=\delta_{\alpha_{i}}\,ik/2\pi\end{array} \right\}. \tag{12}\]

If the only grounds for believing that the equations (12) were consistent with each other and with the equations of motion were that they are known to be consistent in the limit when \(h\!\succ\!0\), the case would not be very strong, since one might be able to deduce from them the inconsistency that \(h\!=\!0\), which would not be an inconsistency in the limit. There is much stronger evidence than this, however, owing to the fact that the classical operations obey the same laws as the quantum ones, so that if, by applying the quantum operations, one can get an inconsistency, by applying the classical operations in the same way one must also get an inconsistency. If a series of classical operations leads to the equation \(0\!=\!0\), the corresponding series of quantum operations must also lead to the equation \(0\!=\!0\), and not to \(h\!=\!0\), since there is no way of obtaining a quantity that does not vanish by a quantum operation with quantum variables such that the corresponding classical operation with the corresponding classical variables gives a quantity that does vanish. The possibility mentioned above of deducing by quantum operations the inconsistency \(h\!=\!0\) thus cannot occur. _The correspondence between the quantum and classical theories lies not so much in the limiting agreement when \(h\!\succ\!0\) as in the fact that the mathematical operations on the two theories obey in many cases the same laws_.

For a system of one degree of freedom, if we take \(p\!=\!m\dot{q}\), the only quantum condition is

\[2\pi m\,(q\dot{q}-\dot{q}q)=ih.\]

Equating the constant part of the left-hand side to \(ih\), we get

\[4\pi m\,\Sigma_{h}\,q\,(nk)\,q\,(kn)\,\omega\,(kn)=h.\]This is equivalent to Heisenberg's quantum condition.* By equating the remaining components of the left-hand side to zero we get further relations not given by Heisenberg's theory.

The quantum conditions (12) get over, in many cases, the difficulties concerning the order in which quantities occurring in products in the equations of motion are to be taken. The order does not matter except when a \(p_{r}\) and \(q\). are multiplied together, and this never occurs in a system describable by a potential energy function that depends only on the \(q\)'s, and a kinetic energy function that depends only on the \(p\)'s.

It may be pointed out that the classical theory quantity occurring in Kramers' and Heisenberg's theory of scattering by atoms+ has components which are of the form (8) (with \(\kappa_{r}=J_{r}\)), and which are interpreted on the quantum theory in a manner in agreement with the present theory. No classical expression involving differential coefficients can be interpreted on the quantum theory unless it can be put into this form.

Footnote †: This content downloaded from 128.103.149.52 on Mon, 18 Dec 2017 18:24:31 UTC

All use subject to [http://about.jstor.org/terms](http://about.jstor.org/terms)

SS 5. _Properties of the Quantum Poisson Bracket Expressions._

In this section we shall deduce certain results that are independent of the assumption of the quantum conditions (11) or (12).

The Poisson bracket expressions satisfy on the classical theory the identity

\[[x,\,y,\,z]\equiv[[x,\,y],\,z]+[[y,\,z],\,x]+[[z,\,x],\,y]=0. \tag{13}\]

On the quantum theory this result is obviously true when \(x\), \(y\) and \(z\) are \(p\)'s or \(q\)'s. Also, from IA and IIA

\[[x_{1}+x_{2},\,y,\,z]=[x_{1},\,y,\,z]+[x_{2},\,y,\,z]\]

and

\[[x_{1},\,x_{2},\,y,\,z]=x_{1}[x_{2},\,y,\,z]+[x_{1},\,y,\,z]\,x_{2}.\]

Hence the result must still be true on the quantum theory when \(x\), \(y\) and \(z\) are expressible in any way as sums and products of \(p\)'s and \(q\)'s, so that it must be generally true. Note that the identity corresponding to (13) when the Poisson bracket expressions are replaced by the differences of the Heisenberg products \((xy-yx)\) is obviously true, so that there is no inconsistency with equation (11).

If H is the Hamiltonian function of the system, the equations of motion may be written classically

\[\dot{p}_{r}=[p_{r}\stackrel{{\cdot}}{{\rm H}}]\qquad\dot{q}_{r}= [q_{r}\,{\rm H}].\]

[MISSING_PAGE_EMPTY:11]

may, for instance, be \(\eta_{r}\xi_{r}=-i/2\pi\cdot J_{r}\), or \(\frac{1}{2}\left(\xi_{r}\eta_{r}+\gamma_{r}\xi_{r}\right)=-i/2\pi\cdot J_{r}\)A detailed investigation of any particular dynamical system is necessary in order to decide what it is. In the event of the last relation being true, we can introduce the set of canonical variables \(\xi_{r}^{\prime}\), \(\gamma_{r}^{\prime}\) defined by

\[\xi_{r}^{\prime}=(\xi_{r}\vdash i\eta_{r})/\surd 2,\qquad\gamma_{r}^{\prime}=(i\xi_{r}+\gamma_{r})/ \surd 2,\]

and shall then have

\[J_{r}=\pi\left(\xi_{r}^{\prime 2}+\gamma_{r}^{\prime 2}\right).\]

This is the case that actually occurs for the harmonic oscillator. In general \(J_{r}\) is not necessarily even a rational function of the \(\xi_{r}\) and \(\gamma_{r}\), an example of this being the rigid rotator considered by Heisenberg.

## 6 The Stationary States.

A quantity C, that does not vary with the time, has all its (\(nm\)) components zero, except those for which \(n=m\). It thus becomes convenient to suppose each set of \(n\)'s to be associated with a definite state of the atom, as on Bohr's theory, so that each C (\(nn\)) belongs to a certain state in precisely the same way in which _every_ quantity occurring in the classical theory belongs to a certain configuration. The components of a varying quantum quantity are so interlocked, however, that it is impossible to associate the sum of certain of them with a given state.

A relation between quantum quantities reduces, when all the quantities are constants, to a relation between C(\(nn\))'s belonging to a definite stationary state \(n\). This relation will be the same as the classical theory relation, on the assumption that the classical laws hold for the description of the stationary states ; in particular, the energy will be the same function of the J's as on the classical theory. We have here a justification for Bohr's assumption of the mechanical nature of the stationary states. It should be noted though, that the variable quantities associated with a stationary state on Bohr's theory, the amplitudes and frequencies of orbital motion, have no physical meaning and are of no mathematical importance.

If we apply the fundamental equation (11) to the quantities \(x\) and \(\mathbf{H}\) we get, with the help of (14),

\[x\;(nm)\;\mathbf{H}\;(mm)\;\mbox{--}\;\mathbf{H}\;(nn)\;x\;(nm)=i\hbar/2\pi\;. \;\dot{x}\;(nm)=\;\mbox{--}\;\hbar/2\pi\;.\;\omega\;(nm)\;x\;(nm),\]

or

\[\mathbf{H}\;(nn)\;\mbox{--}\;\mathbf{H}\;(mm)=\hbar/2\pi\;.\;\omega\;(nm).\]

This is just Bohr's relation connecting the frequencies with the energy differences.

[MISSING_PAGE_EMPTY:13]

## **Batch Processing PDFs**

In [10]:
!mkdir pdfs
!curl -o pdfs/lec_1.pdf https://ocw.mit.edu/courses/8-04-quantum-physics-i-spring-2016/7f930e013cef9cd7dec5aa88baa83f0a_MIT8_04S16_LecNotes1.pdf -o pdfs/lec_2.pdf https://ocw.mit.edu/courses/8-04-quantum-physics-i-spring-2016/afaef4b8271759d352ac75c4e85eaee6_MIT8_04S16_LecNotes2.pdf
!curl -o pdfs/lec_3.pdf https://ocw.mit.edu/courses/8-04-quantum-physics-i-spring-2016/f928b8dce3d6a218fddda9617c5eb4f2_MIT8_04S16_LecNotes3.pdf  -o pdfs/lec_4.pdf https://ocw.mit.edu/courses/8-04-quantum-physics-i-spring-2016/0c07cbdc9c352c39eb9539b31ded90d7_MIT8_04S16_LecNotes4.pdf

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  303k  100  303k    0     0   665k      0 --:--:-- --:--:-- --:--:--  665k
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  509k  100  509k    0     0  3422k      0 --:--:-- --:--:-- --:--:-- 3443k
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  284k  100  284k    0     0   942k      0 --:--:-- --:--:-- --:--:--  942k
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  265k  100  265k    0     0  1727k      0 --:--:-- --:--:-- --:--:-- 1738k


<sup>Source: [Quantum Physics I](https://ocw.mit.edu/courses/8-04-quantum-physics-i-spring-2016/) from MIT OpenCourseWare</sup>

In [11]:
nougat_cmd = "nougat --markdown --out 'batch_directory'"
pdf_path = '/content/pdfs'

for pdf in os.listdir(pdf_path):
  os.system(f"{nougat_cmd} pdf /content/pdfs/{pdf}")

In [12]:
display.Markdown('/content/batch_directory/lec_1.mmd')

B. Zwiebach

February 9, 2016

**Chapter 1: Key Features of Quantum Mechanics**

_Quantum mechanics is now almost one-hundred years old, but we are still discovering some of its surprising features and it remains the subject of much investigation and speculation. The framework of quantum mechanics is a rich and elegant extension of the framework of classical physics. It is also counterintuitive and almost paradoxical._

Quantum physics has replaced classical physics as the correct fundamental description of our physical universe. It is used routinely to describe most phenomena that occur at short distances. Quantum physics is the result of applying the framework of quantum mechanics to different physical phenomena. We thus have Quantum Electrodynamics, when quantum mechanics is applied to electromagnetism, Quantum Optics, when it is applied to light and optical devices, or Quantum Gravity, when it is applied to gravitation. Quantum mechanics indeed provides a remarkably coherent and elegant framework. The era of quantum physics begins in 1925, with the discoveries of Schrodinger and Heisenberg. The seeds for these discoveries were planted by Planck, Einstein, Bohr, de Broglie, and others. It is a tribute to human imagination that we have been able to discover the counterintuitive and abstract set of rules that define quantum mechanics. Here we aim to explain and provide some perspective on the main features of this framework.

We will begin by discussing the property of linearity, which quantum mechanics shares with electromagnetic theory. This property tells us what kind of theory quantum mechanics is and why, it could be argued, it is simpler than classical mechanics. We then turn to photons, the particles of light. We use photons and polarizers to explain why quantum physics is not deterministic and, in contrast with classical physics, the results of some experiments cannot be predicted. Quantum mechanics is a framework in which we can only predict the _probabilities_ for the various outcomes of any given experiment. Our next subject is quantum superpositions, in which a quantum object somehow manages to exist simultaneously in two mutually incompatible states. A quantum light-bulb, for example, could be in a state in which it is both on and off at the same time!

## 1 Linearity of the equations of motion

In physics a theory is usually described by a set of equations for some quantities called the **dynamical variables** of the theory. After writing a theory, the most important task is finding solutions of the equations. A solution of the equations describes a possible reality, according to the theory. Because an expanding universe is a solution of Albert Einstein's gravitational equations, for example, it follows that an expanding universe is possible, according to this theory. A single theory may have many solutions, each describing a possible reality.

There are linear theories and nonlinear theories. Nonlinear theories are more complex than linear theories. In a linear theory a remarkable fact takes place: if you have two solutions you obtain a third solution of the theory simply by adding the two solutions. An example of a beautiful linear theory is Maxwell's theory of electromagnetism, a theory that governs the behavior of electric and magnetic fields. A field, as you probably know, is a quantity whose values may depend on position and on time. A simple solution of this theory describes an electromagnetic wave propagating in a given direction. Another simple solution could describe an electromagnetic wave propagating in a different direction. Because the theory is linear, having the two waves propagating simultaneously, each in its own direction and without affecting each other, is a new and consistent solution. The sum is a solution in the sense that the electric field in the new solution is the sum of the electric field in the first solution plus the electric field in the second solution. The same goes for the magnetic field: the magnetic field in the new solution is the sum of the magnetic field in the first solution plus the magnetic field in the second solution. In fact you can add any number of solutions to still find a solution. Even if this sounds esoteric, you are totally familiar with it. The air around you is full of electromagnetic waves, each one propagating oblivious to the other ones. There are the waves of thousands of cell phones, the waves carrying hundreds of wireless internet messages, the waves from a plethora of radio-stations, TV stations, and many, many more. Today, a single transatlantic cable can carry simultaneously millions of telephone calls, together with huge amounts video and internet data. All of that courtesy of linearity.

More concretely, we say that Maxwell's equations are **linear** equations. A solution of Maxwell's equation is described by an electric field \({\bf E}\) a magnetic field \({\bf B}\), a charge density \(\rho\) and a current density \({\bf J}\), all collectively denoted as \(({\bf E},\,{\bf B}\,,\rho\,,\,{\bf J})\). This collection of fields and sources satisfy Maxwell's equations. Linearity implies that if \(({\bf E},\,{\bf B}\,,\rho\,,\,{\bf J})\) is a solution so is \((\alpha{\bf E},\,\alpha{\bf B}\,,\alpha\rho\,,\,\alpha{\bf J})\), where all fields and sources have been multiplied by the constant \(\alpha\). Given two solutions

\[({\bf E}_{1},{\bf B}_{1},\rho_{1},{\bf J}_{1})\,,\quad{\rm and}\quad({\bf E}_ {2},{\bf B}_{2},\rho_{2},{\bf J}_{2})\,, \tag{1.1}\]

linearity also implies that we can obtain a new solution by adding them

\[({\bf E}_{1}+{\bf E}_{2}\,,\,\,{\bf B}_{1}+{\bf B}_{2}\,,\,\,\rho_{1}+\rho_{2} \,,\,\,{\bf J}_{1}+{\bf J}_{2})\,. \tag{1.2}\]

The new solution may be called the superposition of the two original solutions.

It is not hard to explain what is, in general, a linear equation or a linear set of equations. Consider the equation

\[L\,u\ =\ 0\,, \tag{1.3}\]

where, schematically, \(u\) denotes the unknown. The unknown may be a number, or a function of time, a function of space, a function of time and space, essentially anything unknown! In fact, \(u\) could represent a collection of unknowns, in which case we would replace \(u\) above by \(u_{1},u_{2},\ldots\). The symbol \(L\) denotes a **linear operator**, an object that satisfies the following two properties

\[L(u_{1}+u_{2})\ =\ Lu_{1}+Lu_{2}\,,\qquad L(a\,u)\ =\ aLu\,, \tag{1.4}\]

where \(a\) is a number. Note that these conditions imply that

\[L(\alpha u_{1}+\beta u_{2})\ =\ \alpha Lu_{1}+\beta Lu_{2}\,, \tag{1.5}\]

showing that if \(u_{1}\) is a solution ( \(Lu_{1}=0\)) and \(u_{2}\) is a solution (\(Lu_{2}=0\)) then \(\alpha u_{1}+\beta u_{2}\) is also a solution. We call \(\alpha u_{1}+\beta u_{2}\) the **general superposition** of the solutions \(u_{1}\) and \(u_{2}\). An example may help. Consider the equation

\[\frac{du}{dt}+\frac{1}{\tau}\,u\ =\ 0\,, \tag{1.6}\]

where \(\tau\) is a constant with units of time. This is, in fact, a linear differential equation, and takes the form \(L\,u=0\) if we define

\[L\,u\ \equiv\ \frac{du}{dt}+\frac{1}{\tau}u \tag{1.7}\]

**Exercise 1**. Verify that (1.7) satisfies the conditions for a linear operator.

Einstein's theory of general relativity is a nonlinear theory whose dynamical variable is a gravitational field, the field that describes, for example, how planets move around a star. Being a nonlinear theory, you simply cannot add the gravitational fields of different solutions to find a new solution. This makes Einstein's theory rather complicated, by all accounts much more complicated than Maxwell theory. In fact, classical mechanics, as invented mostly by Isaac Newton, is also a nonlinear theory! In classical mechanics the dynamical variables are positions and velocities of particles, acted by forces. There is no general way to use two solutions to build a third.

Indeed, consider the equation of motion for a particle on a line under the influence of a time-independent potential \(V(x)\), which is in general an arbitrary function of \(x\). The dynamical variable in this problem is \(x(t)\), the position as a function of time. Letting \(V^{\prime}\) denote the derivative of \(V\) with respect to its argument, Newton's second law takes the form

\[m\,\frac{d^{2}x(t)}{dt^{2}}\ =\ -V^{\prime}(x(t))\,. \tag{1.8}\]

The left-hand side is the mass times acceleration and the right hand side is the force experienced by the particle in the potential. It is probably worth to emphasize that the right hand side is the function \(V^{\prime}(x)\) evaluated for \(x\) set equal to \(x(t)\):

\[V^{\prime}(x(t))\ \equiv\frac{\partial V(x)}{\partial x}\Big{|}_{x=x(t)}\,. \tag{1.9}\]

While we could have used here an ordinary derivative, we wrote a partial derivative as is commonly done for the general case of time dependent potentials. The reason equation (1.8) is not a linear equation is that the function \(V^{\prime}(x)\) is not linear. In general, for arbitrary functions \(u\) and \(v\) we expect

\[V^{\prime}(au)\neq\ aV^{\prime}(u)\,,\quad\mbox{and}\quad V^{\prime}(u+v)\neq V ^{\prime}(u)+V(v)\,. \tag{1.10}\]

As a result given a solution \(x(t)\), the scaled solution \(\alpha x(t)\) is not expected to be a solution. Given two solutions \(x_{1}(t)\) and \(x_{2}(t)\) then \(x_{1}(t)+x_{2}(t)\) is not guaranteed to be a solution either.

**Exercise.** What is the most general potential \(V(x)\) for which the equation of motion for \(x(t)\) is linear?

Quantum mechanics is a linear theory. The signature equation in this theory, the so-called Schrodinger equation is a linear equation for a quantity called the **wavefunction** and it determines its time evolution. The wavefunction is the dynamical variable in quantum mechanics but, curiously, its physical interpretation was not clear to Erwin Schrodinger when he wrote the equation in 1925. It was Max Born, who months later suggested that the wavefunction encodes probabilities. This was the correct physical interpretation, but it was thoroughly disliked by many, including Schrodinger, who remained unhappy about it for the rest of his life. The linearity of quantum mechanics implies a profound simplicity. In some sense quantum mechanics is simpler than classical mechanics. In quantum mechanics solutions can be added to form new solutions.

The wavefunction \(\Psi\) depends on time and may also depend on space. The Schrodinger equation (SE) is a partial differential equation that takes the form

\[i\hbar\,\frac{\partial\Psi}{\partial t}=\hat{H}\Psi\,, \tag{1.11}\]where the Hamiltonian (or energy operator) \(\hat{H}\) is a linear operator that can act on wavefunctions:

\[\hat{H}(a\Psi)\ =\ a\,\hat{H}\,\Psi\,,\qquad\hat{H}(\Psi_{1}+\Psi_{2})\ =\ \hat{H}(\Psi_{1})+\hat{H}(\Psi_{2})\,, \tag{1.12}\]

with \(a\) a constant that in fact need not be real; it can be a complex number. Of course, \(\hat{H}\) itself does not depend on the wavefunction! To check that the Schrodinger equation is linear we cast it in the form \(L\Psi=0\) with \(L\) defined as

\[L\Psi\ \equiv\ i\hbar\,\frac{\partial\Psi}{\partial t}-\hat{H}\Psi \tag{1.13}\]

It is now a simple matter to verify that \(L\) is a linear operator. Physically this means that if \(\Psi_{1}\) and \(\Psi_{2}\) are solutions to the Schrodinger equation, then so is the superposition \(\alpha\Psi_{1}+\beta\Psi_{2}\), where \(\alpha\) and \(\beta\) are both complex numbers, i.e. \((\alpha,\beta\in\mathbb{C})\)

## 2 Complex Numbers are Essential

Quantum mechanics is the first physics theory that truly makes use of _complex_ numbers. The numbers most of us use for daily life (integers, fractions, decimals) are _real_ numbers. The set of complex numbers is denoted by \(\mathbb{C}\) and the set of real numbers is denoted by \(\mathbb{R}\). Complex numbers appear when we combine real numbers with the imaginary unit \(i\), defined to be equal to the square root of minus one: \(i\equiv\sqrt{-1}\). Being the square root of minus one, it means that \(i\) squared must give minus one: \(i^{2}=-1\). Complex numbers are fundamental in mathematics. An equation like \(x^{2}=-4\), for an unknown \(x\) cannot be solved if \(x\) has to be real. No real number squared gives you minus one. But if we allow for complex numbers, we have the solutions \(x=\pm 2i\). Mathematicians have shown that all polynomial equations can be solved in terms of complex numbers.

A complex number \(z\), in all generality, is a number of the form

\[z\,=\,a+ib\ \in\ \mathbb{C}\,,\quad a,b\in\mathbb{R}\,. \tag{2.1}\]

Here \(a\) and \(b\) are real numbers, and \(ib\) denotes the product of \(i\) with \(b\). The number \(a\) is called the real part of \(z\) and \(b\) is called the imaginary part of \(z\):

\[\operatorname{Re}z=a\,,\qquad\operatorname{Im}z=b\,. \tag{2.2}\]

The complex conjugate \(z^{*}\) of \(z\) is defined by

\[z^{*}\ =\ a-ib\,. \tag{2.3}\]

You can quickly verify that a complex number \(z\) is real if \(z^{*}=z\) and it is purely imaginary if \(z^{*}=-z\). For any complex number \(z=a+ib\) one can define the _norm_\(|z|\) of the complex number to be a _positive, real_ number given by

\[|z|=\sqrt{a^{2}+b^{2}}\,. \tag{2.4}\]

You can quickly check that

\[|z|^{2}=zz^{*}\,, \tag{2.5}\]

where \(z^{*}\equiv a-ib\) is called the complex conjugate of \(z=a+ib\). Complex numbers are represented as vectors in a two dimensional "complex plane". The real part of the complex number is the \(x\) component of the vector and the imaginary part of the complex number is the \(y\) component. If you consider the unit length vector in the complex plane making an angle \(\theta\) with the \(x\) axis has \(x\) component \(\cos\theta\) and \(y\) component \(\sin\theta\). The vector is therefore the complex number \(\cos\theta+i\sin\theta\). Euler's identity relates this to the exponential of \(i\theta\):

\[e^{i\theta}\ =\ \cos\theta+i\sin\theta\,. \tag{2.6}\]

A complex number of the form \(e^{i\chi}\), with \(\chi\) real is called a _pure phase_.

While complex numbers are sometimes useful in classical mechanics or Maxwell theory, they are not strictly needed. None of the dynamical variables, which correspond to measurable quantities, is a complex number. In fact, complex numbers can't be measured at all: all measurements in physics result in real numbers. In quantum mechanics, however, complex numbers are fundamental. The Schrodinger equation involves complex numbers. Even more, the wavefunction, the dynamical variable of quantum mechanics it itself a complex number:

\[\Psi\in\mathbb{C}\,. \tag{2.7}\]

Since complex numbers cannot be measured the relation between the wavefunction and a measurable quantity must be somewhat indirect. Born's idea to identify probabilities, which are always positive real numbers, with the square of the norm of the wavefunction was very natural. If we write the wavefunction of our quantum system as \(\Psi\), the probabilities for possible events are computed from \(|\Psi|^{2}\). The mathematical framework required to express the laws of quantum mechanics consists of complex vector spaces. In any vector space we have objects called vectors that can be added together. In a complex vector space a vector multiplied by a complex number is still a vector. As we will see in our study of quantum mechanics it is many times useful to think of the wavefunction \(\Psi\) as a vector in some complex vector space.

## 3 Loss of Determinism

Maxwell's crowning achievement was the realization that his equations of electromagnetism allowed for the existence of propagating waves. In particular, in 1865 he conjectured that light was an electromagnetic wave, a propagating fluctuation of electric and magnetic fields. He was proven right in subsequent experiments. Towards the end of the nineteenth century physicists were convinced that light was a wave. The certainty, however, did not last too long. Experiments on blackbody radiation and on the photo-emission of electrons suggested that the behavior of light had to be more complicated than that of a simple wave. Max Planck and Albert Einstein were the most prominent contributors to the resolution of the puzzles raised by those experiments.

In order to explain the features of the photoelectric effect, Einstein postulated (1905) that in a light beam the energy comes in quanta - the beam is composed of packets of energy. Einstein essentially implied that light was made up of particles, each carrying a fixed amount of energy. He himself found this idea disturbing, convinced like most other contemporaries that, as Maxwell had shown, light was a wave. He anticipated that a physical entity, like light, that could behave both as a particle and as a wave could bring about the demise of classical physics and would require a completely new physical theory. He was in fact right. Though he never quite liked quantum mechanics, his ideas about particles of light, later given the name _photons_, helped construct this theory.

It took physicists until 1925 to accept that light could behave like a particle. The experiments of Arthur Compton (1923) eventually convinced most skeptics. Nowadays, particles of light, or photons, are routinely manipulated in laboratories around the world. Even if mysterious, we have grown accustomed to them. Each photon of visible light carries very little energy - a small laser pulse can contain many billions of photons. Our eye, however, is a very good photon detector: in total darkness, we are able to see light when as little as ten photons hit upon our retina. When we say that light behaves like a particle we mean a quantum mechanical particle: a packet of energy and momentum that is not composed of smaller packets. We _do not_ mean a classical point particle or Newtonian corpuscle, which is a zero-size object with definite position and velocity.

As it turns out, the energy of a photon depends only on the color of the light. As Einstein discovered the energy \(E\) and frequency \(\nu\) for a photon are related by

\[E=h\nu \tag{3.1}\]

The frequency of a photon determines the wavelength \(\lambda\) of the light through the relation \(\nu\lambda=c\), where \(c\) is the speed of light. All green photons, for example, have the same energy. To increase the energy in a light beam while keeping the same color, one simply needs more photons.

As we now explain, the existence of photons implies that Quantum Mechanics is not deterministic. By this we mean that the result of an experiment cannot be determined, as it would in classical physics, by the conditions that are under the control of the experimenter.

Consider a polarizer whose preferential direction is aligned along the \(\hat{\bf x}\) direction, as shown in Figure 1. Light that is linearly polarized along the \(\hat{\bf x}\) direction namely, light whose electric field points in this direction, goes through the polarizer. If the incident light polarization is orthogonal to the \(\hat{\bf x}\) direction the light will not go through at all. Thus light linearly polarized in the \(\hat{\bf y}\) direction will be totally absorbed by the polarizer. Now consider light polarized along a direction forming an angle \(\alpha\) with the \(x\)-axis, as shown in Figure 2. What happens?

Thinking of the light as a propagating wave, the incident electric field \({\bf E}_{\alpha}\) makes an angle \(\alpha\) with the \(x\)-axis and therefore takes the form

\[{\bf E}_{\alpha}\ =\ E_{0}\cos\alpha\ \hat{\bf x}+E_{0}\sin\alpha\ \hat{\bf y }\,. \tag{3.2}\]

Figure 1: A polarizer that transmits light linearly polarized along the \(\hat{\bf x}\) direction.

Figure 2: Light linearly polarized along the direction at an angle \(\alpha\) hitting the polarizer.

This is an electric field of magnitude \(E_{0}\). In here we are ignoring the time and space dependence of the wave; they are not relevant to our discussion. When this electric field hits the polarizer, the component along \(\hat{\bf x}\) goes through and the component along \(\hat{\bf y}\) is absorbed. Thus

\[\mbox{Beyond the polarizer:}\qquad{\bf E}=E_{0}\cos\alpha\;\hat{\bf x}\,. \tag{3.3}\]

You probably recall that the energy in an electromagnetic wave is proportional to the square of the magnitude of the electric field. This means that the fraction of the beam's energy that goes through the polarizer is \((\cos\alpha)^{2}\). It is also well known that the light emerging from the polarizer has the _same frequency_ as the incident light.

So far so good. But now, let us try to understand this result by thinking about the photons that make up the incident light. The premise here is that all photons in the incident beam are identical. Moreover the photons do not interact with each other. We could even imagine sending the whole energy of the incident light beam one photon at a time. Since all the light that emerges from the polarizer has the same frequency as the incident light, and thus the same frequency, we must conclude that each individual photon either goes through or is absorbed. If a fraction of a photon went through it would be a photon of lower energy and thus lower frequency, which is something that does not happen.

But now we have a problem. As we know from the wave analysis, roughly a fraction \((\cos\alpha)^{2}\) of the photons must go through, since that is the fraction of the energy that is transmitted. Consequently a fraction \(1-(\cos\alpha)^{2}\) of the photons must be absorbed. But if all the photons are identical, why is it that what happens to one photon does not happen to all of them?

The answer in quantum mechanics is that there is indeed a loss of determinism. No one can predict if a photon will go through or will get absorbed. The best anyone can do is to predict probabilities. In this case there would be a probability \((\cos\alpha)^{2}\) of going through and a probability \(1-(\cos\alpha)^{2}\) of failing to go through.

Two escape routes suggest themselves. Perhaps the polarizer is not really a homogeneous object and depending exactly on where the photon his it either gets absorbed or goes through. Experiments show this is not the case. A more intriguing possibility was suggested by Einstein and others. A possible way out, they claimed, was the existence of _hidden variables_. The photons, while apparently identical, would have other _hidden_ properties, not currently understood, that would determine with certainty which photon goes through and which photon gets absorbed. Hidden variable theories would seem to be untestable, but surprisingly they can be tested. Through the work of John Bell and others, physicists have devised clever experiments that rule out most versions of hidden variable theories. No one has figured out how to restore determinism to quantum mechanics. It seems to be an impossible task.

When we try to describe photons quantum mechanically we could use wavefunctions, or equivalently the language of states. A photon polarized along the \(\hat{\bf x}\) direction is not represented using an electric field, but rather we just give a name for its _state_:

\[\left|\mbox{photon};x\right>. \tag{3.4}\]

We will learn the rules needed to manipulate such objects, but for the time being you could think of it like a vector in some space yet to be defined. Another state of a photon, or vector is

\[\left|\mbox{photon};y\right>, \tag{3.5}\]representing a photon polarized along \(\hat{\bf y}\). These states are the wavefunctions that represent the photon. We now claim that the photons in the beam that is polarized along the direction \(\alpha\) are in a state \(\left|{\rm photon};\alpha\right\rangle\) that can be written as a superposition of the above two states:

\[\left|{\rm photon};\alpha\right\rangle\ =\ \cos\alpha\left|{\rm photon};x\right\rangle+ \sin\alpha\left|{\rm photon};y\right\rangle. \tag{3.6}\]

This equation should be compared with (3.2). While there are some similarities -both are superpositions-one refers to electric fields and the other to "states" of a single photon. Any photon that emerges from the polarizer will necessarily be polarized in the \(\hat{\bf x}\) direction and therefore it will be in the state

\[{\rm Beyond\ the\ polarizer:}\quad\left|{\rm photon};x\right\rangle. \tag{3.7}\]

This can be compared with (3.3) which with the factor \(\cos\alpha\) carries information about the amplitude of the wave. Here, for a single photon, there is no room for such a factor.

In the famous Fifth Solvay International Conference of 1927 the world's most notable physicists gathered to discuss the newly formulated quantum theory. Seventeen out of the twenty nine attendees were or became Nobel Prize winners. Einstein, unhappy with the uncertainty in quantum mechanics stated the nowadays famous quote: "God does not play dice", to which Niels Bohr is said to have answered: "Einstein, stop telling God what to do." Bohr was willing to accept the loss of determinism, Einstein was not.

## 4 Quantum Superpositions

We have already discussed the concept of linearity; the idea that the sum of two solutions representing physical realities represents a new, allowed, physical reality. This superposition of solutions has a straightforward meaning in classical physics. In the case of electromagnetism, for example, if we have two solutions, each with its own electric and magnetic field, the "sum" solution is simply understood: its electric field is the sum of the electric fields of the two solutions and its magnetic field is the sum of the magnetic fields of the two solutions. In quantum mechanics, as we have explained, linearity holds. The interpretation of a superposition, however, is very surprising.

One interesting example is provided by a Mach-Zehnder interferometer; an arrangement of beam splitters, mirrors, and detectors used by Ernst Mach and Ludwig Zehnder in the 1890's to study interference between two beams of light.

A beam splitter, as its name indicates, splits an incident beam into two beams, one that is reflected from the splitter and one that goes through the splitter. Our beam-splitters will be balanced: they split a given beam into two beams of equal intensity (Figure 3). The light that bounces off is called the reflected beam, the light that goes through is called the transmitted beam. The incident beam can hit the beam splitter from the top or from the bottom.

The Mach-Zehnder configuration, shown in Figure 4, has a left beam splitter (BS1) and a right beam splitter (BS2). In between we have the two mirrors, M1 on the top and M2 on the bottom. An incoming beam from the left is split by BS1 into two beams, each of which hits a mirror and is then sent into BS2. At BS2 the beams are recombined and sent into two outgoing beams that go into photon detectors D0 and D1.

It is relatively simple to arrange the beam-splitters so that the incoming beam, upon splitting at BS1 and recombination at BS2 emerges in the top beam which goes into D0. In this arrangement no light at all goes into D1. This requires a precise interference effect at BS2. Note that we have two beams incident upon BS2; the top beam is called '\(a\)' and the lower beam is called '\(b\)'. Two contributions go towards D0: the reflection of '\(a\)' at BS2 and the transmission from '\(b\)' at BS2. These two contributions interfere constructively to give a beam going into D0. Two contributions also go towards D1: the transmission from '\(a\)' at BS2 and the reflection from '\(b\)' at BS2. These two can indeed be arranged to interfere destructively to give no beam going into D1.

It is instructive to think of the incoming beam as a sequence of photons that we send into the interferometer, one photon at a time. This shows that, at the level of photons, the interference is not interference of one photon with another photon. Each photon must interfere with _itself_ to give the result. Indeed interference between two photons is not possible: destructive interference, for example, would require that two photons end up giving no photon, which is impossible by energy conservation.

Therefore, each photon does the very strange thing of going through both branches of the interferometer! Each photon is in a superposition of two states: a state in which the photon is in the top beam or upper branch, added to a state in which the photon is in the bottom beam or lower branch. Thus the state of the photon in the interferometer is a funny state in which the photon seems to be doing two incompatible things at the same time.

Figure 4: A Mach-Zehnder interferometer consists of two beam splitters BS1 and BS2, two mirrors M1 and M2, and two detectors D0 and D1. An incident beam will be split into two beams by BS1. One beam goes through the upper branch, which contains M1, the other beam goes through the lower branch, which contains M2. The beams on the two branches recombine at BS2 and are then sent into the detectors. The configuration is prepared to produce an interference so that all incident photons end at the detector D0, with none at D1.

Figure 3: An incident beam hitting a beam-splitter results in a reflected beam and a transmitted beam. Left: incident beam coming from the top. Right: incident beam coming from the bottom.

Equation (3.6) is another example of a quantum superposition. The photon state has a component along an \(x\)-polarized photon and a component along a \(y\)-polarized photon.

When we speak of a wavefunction, we also sometimes call it a state, because the wavefunction specifies the "state" of our quantum system. We also sometimes refer to states as vectors. A quantum state may not be a vector like the familiar vectors in three-dimensional space but it is a vector nonetheless because it makes sense to add states and to multiply states by numbers. Just like vectors can be added, linearity guarantees that adding wavefunctions or states is a sensible thing to do. Just like any vector can be written as a sum of other vectors in many different ways, we will do the same with our states. By writing our physical state as sums of other states we can learn about the properties of our state.

Consider now two states \(\big{|}A\big{\rangle}\) and \(\big{|}B\big{\rangle}\). Assume, in addition, that when measuring some property \(Q\) in the state \(\big{|}A\big{\rangle}\) the answer is always \(a\), and when measuring the same property \(Q\) in the state \(\big{|}B\big{\rangle}\) the answer is always \(b\). Suppose now that our physical state \(\big{|}\Psi\big{\rangle}\) is the superposition

\[\big{|}\Psi\big{\rangle}\ =\ \alpha\big{|}A\big{\rangle}+\beta\big{|}B\big{\rangle} \,,\qquad\alpha,\beta\in\mathbb{C}\,. \tag{4.1}\]

What happens now if we measure property \(Q\) in the system described by the state \(\big{|}\Psi\big{\rangle}\)? It may seem reasonable that one gets some intermediate value between \(a\) and \(b\), but this is not what happens. A measurement of \(Q\) will yield either \(a\) or \(b\). There is no certain answer, classical determinism is lost, but the answer is always one of these two values and not an intermediate one. The coefficients \(\alpha\) and \(\beta\) in the above superposition affect the probabilities with which we may obtain the two possible values. In fact, the probabilities to obtain \(a\) or \(b\)

\[\text{Probability}(a)\sim|\alpha|^{2}\,,\quad\text{Probability}(b)\sim|\beta|^{ 2}\,. \tag{4.2}\]

Since the only two possibilities are to measure \(a\) or \(b\), the actual probabilities must sum to one and therefore they are given by

\[\text{Probability}(a)\ =\ \frac{|\alpha|^{2}}{|\alpha|^{2}+|\beta|^{2}}\,,\quad \text{Probability}(b)\ =\ \frac{|\beta|^{2}}{|\alpha|^{2}+|\beta|^{2}}\,. \tag{4.3}\]

If we obtain the value \(a\), immediate repeated measurements would still give \(a\), so the state after the measurement must be \(\big{|}A\big{\rangle}\). The same happens for \(b\), so we have

\[\begin{array}{l}\text{After measuring $a$ \ the state becomes $\big{|}\Psi\big{\rangle}\ =|A\big{\rangle}$}\,,\\ \text{After measuring $b$ \ the state becomes $\big{|}\Psi\big{\rangle}\ =|B\big{\rangle}$}\,.\end{array} \tag{4.4}\]

In quantum mechanics one makes the following assumption: _Superposing a state with itself doesn't chance the physics_, nor does it change the state in a non-trivial way. Since superimposing a state with itself simply changes the overall number multiplying it, we have that \(\Psi\) and \(\alpha\Psi\) represent the same physics for any complex number \(\alpha\) different from zero. Thus, letting \(\cong\) represent physical equivalence

\[\big{|}A\big{\rangle}\cong 2\big{|}A\big{\rangle}\cong i\big{|}A\big{\rangle} \cong-|A\rangle\,. \tag{4.5}\]

This assumption is necessary to verify that the polarization of a photon state has the expected number of degrees of freedom. The polarization of a plane wave, as one studies in electromagnetism, is described by two real numbers. For this consider an elliptically polarized wave, as shown in Figure 5. At any given point, the electric field vector traces an ellipse whose shape is encoded by the ratio \(a/b\) of the semi-major axes (the first real parameter) and a tilt encoded by the angle \(\theta\) (the second real parameter). Consider for this a general photon state formed by superposition of the two independent polarization states \(|{\rm photon};x\rangle\) and \(|{\rm photon};y\rangle\):

\[\alpha|{\rm photon};x\rangle+\beta|{\rm photon};y\rangle\,,\quad\alpha,\beta\in \mathbb{C}\,. \tag{4.6}\]

At first sight it looks as if we have two complex parameters \(\alpha\) and \(\beta\), or equivalently, four real parameters. But since the overall factor does not matter we can multiply this state by \(1/\alpha\) to get the equivalent state that encodes all the physics

\[|{\rm photon};x\rangle+\tfrac{\beta}{\alpha}\,|{\rm photon};y\rangle\,, \tag{4.7}\]

showing that we really have one complex parameter, the ratio \(\beta/\alpha\). This is equivalent to two real parameters, as expected.

Let us do a further example of superposition using electrons. Electrons are particles with spin. Classically, we imagine them as tiny balls spinning around an axis that goes through the particle itself. Once an axis is fixed, the electron has two and only two options: its rotation may be clockwise or counterclockwise about the axis, but in both cases it spins at the same fixed rate. These opposite ways of spinning are called _spin up_ and _spin down_ along the axis (see Figure 6). The up and down refer to the direction of the angular momentum associated with the rotation, and it is indicated by an arrow. According to quantum mechanics, and as verified by multiple experiments, the same possibilities, up or down, arise _whatever_ axis we use to measure the spin of the electron.

Physicists usually set up coordinate systems in space by choosing three orthogonal directions, the directions of the \(x\), \(y\), and \(z\) axes. Let us choose to describe our spinning electrons using the \(z\) axis. One possible state of an electron is to be spin up along the \(z\) axis. Such a state is described as \(|\uparrow;z\rangle\), with an arrow pointing up, and the label \(z\) indicating that the spin arrow points along the increasing \(z\) direction. Another possible state of an electron is spin down along the \(z\) axis. Such a state is described as \(|\downarrow;z\rangle\), with an arrow pointing down, meaning this time that the spin points along the decreasing \(z\) direction. If these two are possible realities, so it would be the state \(|\Psi\rangle\) representing the sum

\[|\Psi\rangle\ =\ |\uparrow;z\rangle\,+\,|\downarrow;z\rangle\,.\]

The state \(|\Psi\rangle\) is in a superposition of a spin up and a spin down state. What kind of physics does this sum \(|\Psi\rangle\) represent? It represents a state in which a measurement of the spin along the \(z\) axis would result in two possible outcomes with equal probabilities: an electron with spin up or an electron with spin down. Since we can only speak of probabilities, any experiment must involve repetition until

Figure 5: Parameters that define an elliptically polarized state.

probabilities can be determined. Suppose we had a large ensemble of such electrons, all of them in the above state \(\left|\Psi\right\rangle\). As we measured their spin along \(z\), one at a time, we would find about half of them spinning up along \(z\) and the other half spinning down along \(z\). There is no way to predict which option will be realized as we measure each electron. It is not easy to imagine superposition, but one may try as follows. An electron in the above state is in a different kind of existence in which it is able to both be spinning up along \(z\) and spinning down along \(z\) simultaneously! It is in such a ghostly, eerie state, doing incompatible things simultaneously, until its spin is measured. Once measured, the electron must immediately choose one of the two options; we always find electrons either spinning up or spinning down.

A critic of quantum mechanics could suggest a simpler explanation for the above observations. He or she would claim that the following simpler ensemble results in identical experimental results. In the critic's ensemble we have a large number of electrons with 50% of them in the state \(\left|\uparrow;z\right\rangle\) and 50% of them in the state \(\left|\downarrow;z\right\rangle\). He or she would then state, correctly, that such an ensemble would yield the same measurements of spins along \(z\) as the ensemble of those esoteric \(\left|\Psi\right\rangle\) states. The new ensemble could provide a simpler explanation of the result without having to invoke quantum superpositions.

Quantum mechanics, however, allows for further experiments that can distinguish between the ensemble of our friendly critic and the ensemble of \(\left|\Psi\right\rangle\) states. While it would take us too far afield to explain this, if we measured the spin of the electrons in the \(x\) direction, instead of \(z\) direction, the results would be _different_ in the two ensembles. In the ensemble of our critic we would find 50% of the electrons up along \(x\) and 50% of the electrons down along \(x\). In our ensemble of \(\left|\Psi\right\rangle\) states, however, we would find a very simple result: all states pointing up along \(x\). The critic's ensemble is not equivalent to our quantum mechanical ensemble. The critic is thus shown wrong in his or her attempt to show that quantum mechanical superpositions are not required.

## 5 Entanglement

When we consider superposition of states of _two_ particles we can get the remarkable phenomenon called _quantum mechanical entanglement_. Entangled states of two particles are those in which we can't speak separately of the state of each particle. The particles are bound together in a common

Figure 6: An electron with spin along the \(z\) axis. Left: the electron is said to have spin up along \(z\). Right: the electron is said to have spin down along \(z\). The up and down arrows represent the direction of the angular momentum associated with the spinning electron.

state in which they are _entangled_ with each other.

Let us consider two non-interacting particles. Particle 1 could be in any of the states

\[\{\big{|}u_{1}\big{\rangle},\big{|}u_{2}\big{\rangle},...\}\,, \tag{5.1}\]

while particle 2 could be in any of the states

\[\{\big{|}v_{1}\big{\rangle},\big{|}v_{2}\big{\rangle},...\} \tag{5.2}\]

It may seem reasonable to conclude that the state of the full system, including particle 1 and particle 2 would be specified by stating the state of particle 1 and the state of particle 2. If that would be the case the possible states would be written as

\[\big{|}u_{i}\big{\rangle}\otimes\big{|}v_{j}\big{\rangle},\quad i,j\in\mathbb{ N}\,, \tag{5.3}\]

for some specific choice of \(i\) and \(j\) that specify the state of particle one and particle two, respectively. Here we have used the symbol \(\otimes\), which means _tensor_ product, to combine the two states into a single state for the whole system. We will study \(\otimes\) later, but for the time being we can think of it as a kind of product that distributes over addition and obeys simple rules, as follows

\[\begin{split}(\alpha_{1}\big{|}u_{1}\big{\rangle}+\alpha_{2} \big{|}u_{2}\big{\rangle})\otimes(\beta_{1}\big{|}v_{1}\big{\rangle}+\beta_{2 }\big{|}v_{2}\big{\rangle})&=\quad\alpha_{1}\beta_{1}\big{|}u_ {1}\big{\rangle}\otimes\big{|}v_{1}\big{\rangle}+\alpha_{1}\beta_{2}\big{|}u_ {1}\big{\rangle}\otimes\big{|}v_{2}\big{\rangle}\\ &\quad+\alpha_{2}\beta_{1}\big{|}u_{2}\big{\rangle}\otimes\big{|} v_{1}\big{\rangle}+\alpha_{2}\beta_{2}\big{|}u_{2}\big{\rangle}\otimes\big{|}v_{2} \big{\rangle}\,.\end{split} \tag{5.4}\]

The numbers can be moved across the \(\otimes\) but the order of the states must be preserved. The state on the left-hand side -expanded out on the right-hand side- is still of the type where we combine a state of the first particle \((\alpha_{1}\big{|}u_{1}\big{\rangle}+\alpha_{2}\big{|}u_{2}\big{\rangle})\) with a state of the second particle \((\beta_{1}\big{|}v_{1}\big{\rangle}+\beta_{2}\big{|}v_{2}\big{\rangle})\). Just like any one of the states listed in (5.3) this state is not entangled.

Using the states in (5.3), however, we can construct more intriguing superpositions. Consider the following one

\[\big{|}u_{1}\big{\rangle}\otimes\big{|}v_{1}\big{\rangle}+\big{|}u_{2}\big{ }\big{\rangle}\otimes\big{|}v_{2}\big{\rangle}\,. \tag{5.5}\]

A state of two particles is said to be **entangled** if it cannot be written in the factorized form \((\cdots)\otimes(\cdots)\) which allows us to describe the state by simply stating the state of each particle. We can easily see that the state (5.5) cannot be factorized. If it could it would have to be with a product as indicated in (5.4). Clearly, involving states like \(|u_{3}\rangle\) or \(|v_{3}\rangle\) that do not appear in (5.5) would not help. To determine the constants \(\alpha_{1},\alpha_{2},\beta_{1},\beta_{2}\) we compare the right hand side of (5.4) with our state and conclude that we need

\[\alpha_{1}\beta_{1}=1\,,\quad\alpha_{1}\beta_{2}=0\,,\quad\alpha_{2}\beta_{1} =0\,,\quad\alpha_{2}\beta_{2}=1\,. \tag{5.6}\]

It is clear that there is no solution here. The second equation, for example, requires either \(\alpha_{1}\) or \(\beta_{2}\) to be zero. Having \(\alpha_{1}=0\) contradicts the first equation, and having \(\beta_{2}=0\) contradicts the last equation. This confirms that the state (5.5) is indeed an entangled state. There is no way to describe the state by specifying a state for each of the particles.

Let us illustrate the above discussion using electrons and their spin states. Consider a state of two electrons denoted as \(|\uparrow\rangle\otimes|\downarrow\rangle\). As the notation indicates, the first electron, described by the first arrow, is up along \(z\) while the second electron, described by the second arrow, is down along \(z\) (we omit the label \(z\) on the state for brevity). This is not an entangled state. Another possible state is one where they are doing exactly the opposite: in \(|\downarrow\rangle\otimes|\uparrow\rangle\) the first electron is down and the second is up. This second state is also not entangled. It now follows that by superposition we can consider the state

\[|\uparrow\rangle\otimes|\downarrow\rangle\ +\ |\downarrow\rangle\otimes|\uparrow\rangle\,. \tag{5.7}\]

This is a entangled state of the pair of electrons.

**Exercise**. Show that the above state cannot be factorized and thus is indeed entangled.

In the state (5.7) the first electron is up along \(z\) if the second electron is down along \(z\) (first term), or the first electron is down along \(z\) if the second electron is up along \(z\) (second term). There is a correlation between the spins of the two particles; they always point in opposite directions. Imagine that the two entangled electrons are very far away from each other: Alice has one electron of the pair on planet earth and Bob has the other electron on the moon. Nothing we know is connecting these particles but nevertheless the states of the electrons are linked. Measurements we do on the separate particles exhibit correlations. Suppose Alice measures the spin of the electron on earth. If she finds it up along \(z\), it means that the first summand in the above superposition is realized, because in that summand the first particle is up. As discussed before, the state of the two particles immediately becomes that of the first summand. This means that the electron on the moon will _instantaneously_ go into the spin down-along-\(z\) configuration, something that could be confirmed by Bob, who is sitting in the moon with that particle in his lab. This effect on Bob's electron happens before a message, carried with the speed of light, could reach the moon telling him that a measurement has been done by Alice on the earth particle and the result was spin up. Of course, experiments must be done with an ensemble that contains many pairs of particles, each pair in the same entangled state above. Half of the times the electron on earth will be found up, with the electron on the moon down and the other half of the times the electron on earth will be found down, with the electron on the moon up.

Our friendly critic could now say, correctly, that such correlations between the measurements of spins along \(z\) could have been produced by preparing a _conventional_ ensemble in which 50% of the pairs are in the state \(|\uparrow\rangle\otimes|\downarrow\rangle\) and the other 50% of the pairs are in the state \(|\downarrow\rangle\otimes|\uparrow\rangle\). Such objections were dealt with conclusively in 1964 by John Bell, who showed that if Alice and Bob are able to measure spin in _three_ arbitrary directions, the correlations predicted by the quantum entangled state are different from the classical correlations of _any_ conceivable conventional ensemble. Quantum correlations in entangled states are very subtle and it takes sophisticated experiments to show they are not reproducible as classical correlations. Indeed, experiments with entangled states have confirmed the existence of quantum correlations. The kind of instantaneous action at a distance associated with measurements on well-separated entangled particles does not lead to paradoxes nor, as it may seem, to contradictions with the ideas of special relativity. You cannot use quantum mechanical entangled states to send information faster than the speed of light.

_Sarah Geller transcribed Zwiebach's handwritten notes to create the first LaTeX version of this document._MIT OpenCourseWare

[https://ocw.mit.edu](https://ocw.mit.edu)

8.04 Quantum Physics I

Spring 2016

For information about citing these materials or our Terms of Use, visit: [https://ocw.mit.edu/terms](https://ocw.mit.edu/terms).

# **References and Additional Learning**

## **Documentation**

- **[Nougat: Neural Optical Understanding for Academic Documents](https://github.com/facebookresearch/nougat) GitHub Repository**
- **[Nougat: Neural Optical Understanding for Academic Documents](https://arxiv.org/abs/2308.13418) Paper on arXiv**

## **Lectures**

- **[Quantum Physics I](https://ocw.mit.edu/courses/8-04-quantum-physics-i-spring-2016/) from MIT OpenCourseWare**

## **Papers**

- **[The Fundamental Equations of Quantum Mechanics](https://www.informationphilosopher.com/solutions/scientists/dirac/Fund_QM_1925.pdf) by Paul Dirac from the Proceedings of the Royal Society of London**
- **[The Postulates of Quantum Mechanics](https://www.sydney.edu.au/science/chemistry/~mjtj/CHEM3117/Resources/postulates.pdf) from the University of Sydney**

# **Connect**
- **Feel free to connect with Adrian on [YouTube](https://www.youtube.com/channel/UCPuDxI3xb_ryUUMfkm0jsRA), [LinkedIn](https://www.linkedin.com/in/adrian-dolinay-frm-96a289106/), [Twitter](https://twitter.com/DolinayG), [GitHub](https://github.com/ad17171717), [Medium](https://adriandolinay.medium.com/) and [Odysee](https://odysee.com/@adriandolinay:0). Happy coding!**