# Working with ASCII momentum tuple files

Pawian usually imports its data from momentum tuples written to an ASCII text file. Each line consists of four values: the energy and the $x$, $y$, $z$-components of the 3-momentum. The lines are grouped by event and can be preceded by an event weight. An example of two weighted events of three particles each would be:
```
0.99407
-0.00357645   0.0962561   0.0181079    0.170545
   0.224019    0.623156    0.215051     1.99057
  -0.174404   -0.719412   -0.233159      2.0243
0.990748
 -0.0328198   0.0524406   0.0310079    0.155783
  -0.619592    0.141315     0.32135     1.99619
   0.698477   -0.193756   -0.352357     2.03593
```
The `pawian.data` module imports such an ASCII file to a nicely formatted [pandas.DataFrame](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html) and provides a few accessors that facilitate visualisation of the content of the ASCII file.

The fact that we works with a `pandas.DataFrame` also allows one to make selections of the content and write the filtered data set to another ASCII file for Pawian (and whatever other format is [already supported](https://pandas.pydata.org/pandas-docs/stable/reference/frame.html#serialization-io-conversion)) by `pandas.DataFrame`.

## Import data

In this example, we use the test files provided the `pawian.data` module folder in the repository.

In [None]:
from os.path import dirname, realpath
import pawian
sample_dir = f"{dirname(realpath(pawian.__file__))}/samples"
filename_data = f'{sample_dir}/momentum_tuples_data.dat'
filename_mc = f'{sample_dir}/momentum_tuples_mc.dat'

The data file describes momentum tuples for a $e^+e^- \to \pi+D^0D^+$ decay (in that order!). This information can be passed on to the `read_ascii` function to create a `pandas.DataFrame`.

In [None]:
from pawian.data import read_ascii
particles = ['pi+', 'D0', 'D-']
frame = read_ascii(filename_data, particles=particles)
frame

## Investigate content of the dataframe

Notice that the dataframe makes use of [multi-indexing](https://pandas.pydata.org/pandas-docs/stable/user_guide/advanced.html) for the columns. This allows us for instance to make easy selections per particle, like this:

In [None]:
frame['pi+'] + frame['D-']

In [None]:
frame[['pi+', 'D0']].mean()

Even better, we immediately have all powerful techniques of `pandas.DataFrame` at our disposal:

In [None]:
frame['D-'].hist(bins=50);

In [None]:
frame['weight'].hist(bins=80);

## Special accessors

Now that we have imported from the `pawian.data` sub-module, a few simple [accessors to the dataframe](https://pandas.pydata.org/pandas-docs/stable/development/extending.html#registering-custom-accessors) have become available in the namespace `pawian` of the `pandas.DataFrame`. They can be called from the `pawian` namespace like so:

In [None]:
print("Has weights:       ", frame.pawian.has_weights)
print("Contains particles:", frame.pawian.particles)
print("Contains momenta:  ", frame.pawian.momentum_labels)

The accessors also allow to get kinematic variables:

In [None]:
frame.pawian.p_xyz

In [None]:
frame.pawian.mass.mean()

And the best part: you can just add the vectors and do analysis on for isntance their combined invariant mass!

In [None]:
dm = frame['D-']
pip = frame['pi+']
(dm + pip).pawian.mass.hist(bins=100);

## Selecting and exporting

As mentiond, `pandas.DataFrame` allows us to make certain selections:

In [None]:
weights = frame['weight']
selection = frame[weights > .95]
selection

The frame can then be exported to an ASCII file that can be parsed by pawian like so: 

In [None]:
output_file = 'selected_data.dat'
selection.pawian.write_ascii(output_file)

In [None]:
imported_frame = read_ascii(output_file, particles)
imported_frame