In [None]:
%config Completer.use_jedi = False

# Step 2: Generate data samples

In this section, we will use the {class}`~expertsystem.amplitude.helicity.HelicityModel` that we created with the expert system in [the previous step](1_create_model) to generate a data sample via hit & miss Monte Carlo. We do this with the {mod}`.data.generate` module.

First, we {func}`pickle.load` the {class}`~expertsystem.amplitude.helicity.HelicityModel` that was created in the previous step:

In [None]:
import pickle

with open("helicity_model.pickle", "rb") as model_file:
    model = pickle.load(model_file)

In [None]:
reaction_info = model.kinematics.reaction_info
initial_state = next(iter(reaction_info.initial_state.values()))
print("Initial state:")
print(" ", initial_state.name)
print("Final state:")
for i, p in reaction_info.final_state.items():
    print(f"  {i}: {p.name}")

## 2.1 Generate phase space sample

{class}`~expertsystem.amplitude.kinematics.HelicityKinematics` defines the constraints of the phase space. As such, we have enough information to generate a **phase-space sample** for this particle reaction. We do this with the {func}`.generate_phsp` function. By default, this function uses {class}`.TFPhaseSpaceGenerator` as a, well... phase-space generator (using tensorflow in the back-end) and generates random numbers with {class}`.TFUniformRealNumberGenerator`. You can specify this with the arguments of {func}`.generate_phsp` function.

In [None]:
import pandas as pd
from tensorwaves.data.generate import generate_phsp

phsp_sample = generate_phsp(300_000, model.kinematics)
pd.DataFrame(phsp_sample.to_pandas())

The resulting phase space sample is {class}`~expertsystem.amplitude.data.MomentumPool` of {class}`~expertsystem.amplitude.data.FourMomentumSequence`s each particle in the final state. The {meth}`~expertsystem.amplitude.data.MomentumPool.to_pandas` can be used to convert the {class}`~expertsystem.amplitude.data.MomentumPool` to a format that can be understood by {class}`pandas.DataFrame`.

## 2.2 Generate intensity-based sample

'Data samples' are more complicated than phase space samples in that they represent the intensity profile resulting from a reaction. You therefore need a {class}`.Function` object that expresses an intensity distribution and a phase space over which to generate that distribution. We call such a data sample an **intensity-based sample**.

An intensity-based sample is generated with the function {func}`.generate_data`. Its usage is similar to {func}`.generate_phsp`, but now you have to provide a {obj}`.Function` in addition to the {obj}`~expertsystem.amplitude.kinematics.HelicityKinematics` object. A {obj}`.Function` object can be created with the {meth}`.SympyModel.lambdify` method:

In [None]:
from tensorwaves.physics.amplitude import SympyModel

sympy_model = SympyModel(
    expression=model.expression,
    parameters=model.parameters,
)
intensity = sympy_model.lambdify(backend="numpy")

That's it, now we have enough info to create an intensity-based data sample. Notice how the structure is the sample as the phase-space sample we saw before:

In [None]:
from tensorwaves.data.generate import generate_data

data_sample = generate_data(30_000, model.kinematics, intensity)
pd.DataFrame(data_sample.to_pandas())

## 2.3 Visualize kinematic variables

We now have a phase space sample and an intensity-based sample. Their data structure isn't the most informative though: it's just a collection of four-momentum tuples. However, we can use the {class}`~expertsystem.amplitude.kinematics.HelicityKinematics` class to convert those four-momentum tuples to a data set of kinematic variables.

Now we can use the {meth}`~expertsystem.amplitude.kinematics.HelicityKinematics.convert` method to convert the phase space and data samples of four-momentum tuples to kinematic variables.

In [None]:
phsp_set = model.kinematics.convert(phsp_sample)
data_set = model.kinematics.convert(data_sample)
list(data_set)

The data set is just a {obj}`dict` of kinematic variables (keys are the names, values is a list of computed values for each event). The numbers you see here are final state IDs as defined in the {class}`~expertsystem.amplitude.helicity.HelicityModel` member of the {class}`~expertsystem.amplitude.helicity.HelicityModel`:

In [None]:
for state_id, particle in model.kinematics.reaction_info.final_state.items():
    print(f"ID {state_id}:", particle.name)

````{admonition} Available kinematic variables
---
class: dropdown
---
By default, {mod}`tensorwaves` only generates invariant masses of the {class}`Topologies <expertsystem.reaction.topology.Topology>` that are of relevance to the decay problem. In this case, we only have resonances $f_0 \to \pi^0\pi^0$. If you are interested in more invariant mass combinations, you can do so with the method {meth}`~expertsystem.amplitude.kinematics.HelicityKinematics.register_topology`.
````

The data set is {obj}`dict`, which allows us to easily convert it to a {class}`pandas.DataFrame` and plot its content in the form of a histogram:

In [None]:
import numpy as np
import pandas as pd

data_frame = pd.DataFrame(data_set.to_pandas())
phsp_frame = pd.DataFrame(data_set.to_pandas())
data_frame

This also means that we can use all kinds of fancy plotting functionality of for instance {mod}`matplotlib.pyplot` to see what's going on. Here's an example:

In [None]:
import numpy as np
from matplotlib import cm

reaction_info = model.kinematics.reaction_info
intermediate_states = sorted(
    (
        p
        for p in model.particles
        if p not in reaction_info.final_state.values()
        and p not in reaction_info.initial_state.values()
    ),
    key=lambda p: p.mass,
)

evenly_spaced_interval = np.linspace(0, 1, len(intermediate_states))
colors = [cm.rainbow(x) for x in evenly_spaced_interval]

In [None]:
import matplotlib.pyplot as plt

data_frame["m_12"].hist(bins=100, alpha=0.5, density=True)
plt.xlabel("$m$ [GeV]")
for i, p in enumerate(intermediate_states):
    plt.axvline(x=p.mass, linestyle="dotted", label=p.name, color=colors[i])
plt.legend();

## 2.4 Export data sets

{mod}`tensorwaves` currently has no export functionality for data samples. However, as we work with {class}`numpy.ndarray`, it's easy to just {mod}`pickle` these data samples with {func}`numpy.save`:

In [None]:
import pickle

with open("data_set.pickle", "wb") as stream:
    pickle.dump(data_set, stream)
with open("phsp_set.pickle", "wb") as stream:
    pickle.dump(phsp_set, stream)

In the [next step](3_perform_fit), we will illustrate how to 'perform a fit' with {mod}`tensorwaves` by optimizing the intensity model to these data samples.