```{autolink-concat}
```

<!-- no-set-nb-cells -->

# Symbolic amplitude models

Amplitude analysis attempts to describe the intensity distributions that we observe in particle collider experiments in order to extract parameters about intermediate states appearing in the scattering process. The scattering processes that take place in these experiments are governed by the electroweak force and the strong force. The strong force can be described by Quantum Chromodynamics, which exhibits complicated behavior in the low-energy regime.

The complicated nature of the strong force makes it difficult to derive intensity models from first principles. Instead, we have to rely on approximations given specific assumptions for the scattering process that we study. Each amplitude model that we formulate is almost always merely an approximation of the true scattering process. As a consequence, we always have to reassess our analysis results and try alternative models. In addition, amplitude models can be extremely complicated, with large, complex-valued parametrizations and dozens of input parameters. We therefore want to evaluate these models with as much information as possible. That means large input data samples and 'fits' using the full likelihood function.

Given these challenges, we can identify **three major requirements that amplitude analysis software should satisfy**:

:::{card} {material-regular}`speed` Performance
:link: performance
:link-type: ref
We want to evaluate likelihood function as fast as possible over large data samples, so that we can optimize our model parameters for several hypotheses.
:::

:::{card} {material-regular}`draw` Flexibility
:link: flexibility
:link-type: ref
We want to quickly formulate a wide range of amplitude models, given the latest theoretical and experimental insights.
:::

:::{card} {material-regular}`school` Transparency
:link: transparency
:link-type: ref
It must be easy to understand the mathematics behind the implemented model, so that the analysis can be reproduced or compared to comparable experiments.
:::

(performance)=
## {material-regular}`speed` Performance

### Array-oriented programming

Even though Python is a popular programming language for data science, it is too slow for performing computations over large data samples. Computations in Python programs are therefore almost always outsourced through third-party Python libraries that are written in C++ or other compiled languages. This leads to a programming style that we can call **array-oriented programming**. Variables now represent multidimensional arrays and the computational backend performs the element-wise operations behind-the-scenes. This has the additional benefit that the higher level Python code becomes more readable.

In the following example, we have two data samples $a$ and $b$, each containing a million data points, and we want to compute $c_i=a_i+b_i^2$ for each of these data point&nbsp;$i$. For simplicity, we set $a$ and $b$ to be `[0, 1, 2, ..., 999_999]` and `[999_999, 999_998, ..., 1, 0]`.

In [None]:
a_lst = list(range(1_000_000))
b_lst = list(reversed(range(1_000_000)))

#### Pure Python loop

Naively, one could compute $c$ for each data point by creating a list and filling it with $c_i = a_i+b_i^2$.

In [None]:
%%timeit
c_lst = []
for a_i, b_i in zip(a_lst, a_lst):
    c_lst.append(a_i + b_i**2)

For loops like these are a natural choice when coming from compiled languages like C++, but is considerably much slower when done with Python.

#### Array-oriented computation

[NumPy](https://numpy.org) is one of the most popular array-oriented libraries for Python. The data points for $a$ and $b$ are now represented by array objects...

In [None]:
import numpy as np

a = np.array(a_lst)
b = np.array(b_lst)

...and our _array-oriented_ computation of $c = a+b^2$ becomes much **faster** and **more readable**.

In [None]:
%%timeit
c = a + b**2

(flexibility)=
## {material-regular}`draw` Flexibility

### Computer Algebra System

### Symbolic expressions as template

(transparency)=
## {material-regular}`school` Transparency

### Self-documenting workflow

### Model preservation

## Summary

We believe that formulating amplitude models symbolically with a Computer Algebra System has several benefits for the amplitude analysis community:

- **Amplitude analyses become reproducible, extendable, and portable:**
  - Implemented amplitude models are transparently shared as mathematical formulas in a [self-documenting workflow](#self-documenting-workflow). This allows others to reimplement those models with their own framework of choice, or any time in the future when upcoming languages or libraries make the implementation of the analysis outdated.
  - The Python ecosystem in combination with Jupyter Notebooks and Sphinx makes it possible to directly rerun analysis in the browser or in some virtual environment locally. [Pinned dependencies](https://github.com/ComPWA/update-pip-constraints) ensure that the analysis produces the same results.

- **Lower entry level and knowledge sharing**:<br>
  It becomes much easier to share and maintain knowledge gained about amplitude models and amplitude analysis theory. Symbolic amplitude models directly show the implemented mathematics and their numerical functions can directly be used for interactive visualizations.