## Test and demonstate usage of PhyPraKit stand-alone tools

PhyPrakit provides some Python-Scripts that perform basic actions on data and fit models
defined in a *yaml* file. In general, no extra private code is needed in addition.

  - *plotData*   plot data and uncertainties from file in *yaml* format
  - *plotCSV*    plot data from a file in CSV format; german ',' is replaced by '.'
  - *run_phyFit* run a fit defined in a *yaml* file
  - *csv2yaml*   convert data in CSV format (e.g. MS Excel export) to *yaml* format
  - *smoothCSV*  resample data from a CSV file

The **kafe2** package also provides a stand-alone tool,

   - kafe2go     run a fit with *kafe2* from an input file in *yaml* format

Execution of scripts is done by use of the Jupyter *%run* magic command. For this to work, the 
python script must be specified with its full path, or contained in the current jupyter work directory.


### General remarks

The stand-anlone scripts take a number of parametrs on the command line. If a script is started without 
any parameters, usage help is printed. See this example:

In [None]:
%run plotCSV.py

### Plot data fom CSV file

In [None]:
%run plotCSV -H 2 Wellenform.csv

## Statistical analysis of measured data

The first step in data analyis usually consists of inspecting a frequency distribution of measured data.
The program *plotData* contains the necessary code; it shows the distribution and calculates the mean
and standard deviation of the data. The file *simple_data.yaml*, as shown below, contains all the necessary
input an can easily be worked on using with the Editor provided by Jupyter. Just double-click on the file name
in the file list on the left-hand side in your Jupyter window to open it. To generate a new file, right-click
in the list, provide a name of a new, empty file, and open it by double-clicking. 

```
  # Beispiel einer Histogramm-Darstellung
  # -------------------------------------
  type: histogram
  title: "Wiederholte Messungen von Tischhöhen"
  label: Beispieldaten
  x_label: 'Höhe h (cm)'
  y_label: 'Verteilungsdichte f(h)'

  # Daten:
  raw_data: [
  79.83,79.63,79.68,79.82,80.81,79.97,79.68,80.32,79.69,79.18,
  80.04,79.80,79.98,80.15,79.77,80.30,80.18,80.25,79.88,80.02 ]
  n_bins: 20
  bin_range: [79., 81.]
  # alternatively an array for the bin edges can be specified
  #bin_edges: [79., 79.5, 80, 80.5, 81.]

  model_label: Gauss-Verteilung
  model_density_function: |
    def normal_distribution(x, mu=79.9, sigma=0.346):
      return np.exp(-0.5 *((x-mu)/sigma)**2)/np.sqrt(2.*np.pi*sigma**2)
```

The simple command to run the example looks like this: 

In [None]:
%run plotData simple_data.yaml

### Run a simple fit

Fiting models experimental data, or parametriing measuements with a functional depencence,
is one of the routine tasks in data analysis. Tow powerful fitting programs relying on
the *phyFit* or *kafe2* packages are provided:  

 - kafe2go
 - run_phyFit


First, let us see how the interfaces are defined by running the scrips with the -h key (for "help"):

In [None]:
%run run_phyFit -h

In [None]:
%run kafe2go.py -h

Now, run a very simple fit of a straight line to data with only independent uncertainties in the x- and y-directions, as
specified in the file *simpleFit.fit*. You may want to inspect the input by double-clicking on the file name. 

In [None]:
%run run_phyFit simpleFit.fit

In [None]:
%run kafe2go simpleFit.fit

#### A more complex fit example with different types of uncertainties

To inspect the input file, double-click on the name *xyFit.fit* in the directory listing on the left-hand side.
It will open in the editor in a new tab. It is possible to change this file and try out modifications. Executing
this example is not more complicated than the first one:

In [None]:
%run run_phyFit.py test_xy.fit

Note that a simplified data format is used above relying on default values for the properties of uncertainties,
which are assumed to be independent, absolute and uncorrelated if not specified otherwise. Running *kafe2* yields
the same result as *phyFit*: 

In [None]:
%run kafe2go --asymmetric test_xy.fit

## Fitting a model to histogram data
Fitting a model to histogram data is also possible. Note that in this case both
*phyFit* and *kafe2* use a negative-Log-likelihood function taking care of the
Poisson nature of the uncertainties. Here the command to run:

In [None]:
%run run_phyFit.py hFit.fit

And here the same with kafe2go: 

In [None]:
%run kafe2go hFit.fit

### Handling data in CSV - Format

The CSV (for Comma- or Charachter-Separated Values) is quite common in data science, and 
many software packages export data in this format or support it (including Leybold Cassy
and MS EXCEL). 

PhyPraKit provides a tool to ease the conversion to the more general *yaml* format. After
converting the input data, extra lines can be added using ana text editor or, better the 
editor provided as part of Jupyter Notebooks. Here is an example wihtout input showing
all options:

In [None]:
%run csv2yml.py -h

### The input to convert a file with audio data looks like this:

In [None]:
%run csv2yml.py AudioData.csv

The CSV tools of PhyPraKit also can handle the output of typical Windows-Programs using decimal commas 
instead of the internationally used dot. To be unambiguous, the field delimiter is then ";" and not the
usua commal. We just need to tell plotCSV to obtain valid *yaml* format from such an input:

In [None]:
%run csv2yml -d ";" Excel_output.csv

Using a text editor, e.g. by creating a new, empty file by right-clicking in the director list on the left-hand side and double-clicking on it,
the *yaml*-block from the above output can be copied to a new *yaml*-file.  This file should contain additional information, most importantly 
"meta-data" giving on the origin of the data, adjustments of the field keys to be compatible wiht *run_phyFit* or *kafe2go*, and a fit model.

A valid fit-input file for a straight-line-fit looks like this:

```
x_data: [0.05, 0.36, 0.68, 0.8, 1.09, 1.46, 1.71, 1.83, 2.44, 2.09, 3.72, 4.36, 4.6]
y_data: [0.35, 0.26, 0.52, 0.44, 0.48, 0.55, 0.66, 0.48, 0.75, 0.7, 0.75, 0.8, 0.9]
y_errors: [0.06, 0.07, 0.05, 0.05, 0.07, 0.07, 0.09, 0.1, 0.11, 0.1, 0.11, 0.12, 0.1]
x_errors: 3% 

# model specification
model_label: 'line fit'
model_function: |
    def linModel(x, a=0, y0=1.):
      return y0 + a * x
```

In [None]:
%run run_phyFit.py from_Excel.fit