# `GenericSeries` Tutorial

## Introduction

The `GenericSeries` class describes a potentially multi-dimensional quantity that depends on one or more dimensions.
Some examples are the position of the welding torch that depends on time or the workpiece temperature field that depends on time and space.
The data of the `GenericSeries` can either be stored in form of explicit values or as a mathematical expression.
It's main feature is that you can evaluate the data at any given coordinate of the dimensions it depends on.
This happens either through interpolation if the data is discrete or through direct evaluation of the mathematical expression.
We can use the `GenericSeries` in our scripts and jupyter notebooks by importing it from the WelDX python package:

In [None]:
from weldx import GenericSeries

For this tutorial, we will also need to import the following packages and classes:

In [None]:
from weldx import Q_
from xarray import DataArray

import matplotlib.pyplot as plt
import numpy as np

## Terminology

Before we can start with the actual tutorial, we need to discuss some terminology.
It is essential to understand the differences between the following terms to avoid confusion throughout the course of this tutorial.

**Dimension**

Each dimension describes a single degree of freedom. 
We can think of it as a 1d-coordinate axis. 
Using multiple dimensions will create a multi-dimensional space. 
A typical example would be the dimensions $x$, $y$, and $z$ that form 3d-space. 
Another popular dimension is time.

**Coordinates**

A coordinate is a specific value or label on the 1d-axis of a dimension. 
We can specify the location of a point in 3d-space by providing its coordinates. 
For example, we can use $x=1m$, $y=3m$, and $z=0m$.
These are coordinates for the dimensions $x$, $y$, and $z$.
Dimensions represent degrees of freedom, coordinates are discrete values of a dimension.

**Variable**

If a mathematical expression is used to describe the `GenericSeries`, the individual terms of this expression can be divided into two groups. 
The first group are variables. 
Variables are symbols that don't get values assigned to them during the creation of a `GenericSeries`.
They let us evaluate the expression for differen coordinates.
Consider the following expression:

$$
2 \cdot x + 3
$$

Here, $x$ is our variable. 
We can evaluate this expression over and over again by providing differen values/coordinates for $x$.
For example, if we would use $x=2$, the result is $7$. 
With $x=4$ we would get $11$. 
An important fact to note is that each variable of a `GenericSeries`' expression is a dimension. 
But not every dimension of an expression is necessarily represented by a variable.
We will show some code examples later that make this more understandable.

**Parameter**

The second group of therms in an expression based `GenericSeries` are parameters.
Parameters are also symbols of an expression, but in contrast to variables, they already get discrete values assigned to them.
Consider the following expression:

$$
a \cdot t + b
$$

with:

$$
\begin{matrix}
a=&3m/s\\
b=&5m
\end{matrix}
$$

$a$ and $b$ are parameters, because they have values assigned to them. 
`t` is still a variable

## Discrete data

### Construction

As mentioned in the introduction, the `GenericSeries` can either describe a dimension dependent quantity by a set of discrete values or a mathematical expression.
We will start this tutorial with discrete values.

Let's say we we want to describe the temperature of a specimen along our welding groove during a single pass welding experiment.
The spatial direction along the groove is the dimension `x`.
Time is represented by the dimension `t`.
We have measured the temparature at 4 differrent points in time and at 6 different positions.
Our data measured in Kelvin is:

In [None]:
t_0 = [300, 300, 300, 300, 300, 300]
t_1 = [800, 1200, 400, 300, 300, 300]
t_2 = [450, 500, 600, 800, 1200, 400]
t_3 = [412, 425, 450, 500, 600, 800]

data = Q_([t_0, t_1, t_2, t_3], "K")

We also know the coordinates of the data in `x` and `t`:

In [None]:
coords_t = Q_([0, 10, 20, 30], "s")
coords_x = Q_([0, 5, 10, 15, 20, 25], "cm")

Here is a quick plot of our temperature data:

In [None]:
plt.plot(
    coords_x.m, np.transpose(data.m),
    label=[f"t={v}" for v in coords_t]
)
plt.gca().legend()

Now we can create our `GenericSeries` an follows:

In [None]:
gs_discrete = GenericSeries(
    data=data, 
    dims=["t", "x"], 
    coords={"t":coords_t, "x":coords_x}
)
gs_discrete

> TODO: Check and update discret __repr__ -> do not print all values, don't print Coordinates twice

The first argument is the raw data.
`dims` expects a list of strings that we can use to give our dimensions names.
With `coords` we provide the coordinates of our discrete values.
`dims` and `coords` are optional.
If you don't provide dimension names, the `GenericSeries` will use default names:

In [None]:
GenericSeries(data=data).dims

If you are already familiar with the `xarray` python package, you might have noticed the similarities between the construction of a `GenericSeries` and an `xarray.DataArray`.
In fact, the discrete version of the `GenericSeries` is based on an `xarray.DataArray` and they share some interfaces with comparable behavior.

### Accessing data

If you want to access a single item you can use the `[]` operator to select elemets by index:

In [None]:
gs_discrete[3,4]

Slicing is also possible:

In [None]:
gs_discrete[2:4,:]

> TODO: implement and demonstrate sel function like in xarray

## Evaluation/Interpolation

Even though the `GenericSeries` might be based on discrete values, you should think of it as some kind of mathematical function object that can be evaluated at any coordinate along its dimensions.
To do so, we simply use the call operator `()` on our `GenericSeries` and specify the coordinates we are interested in.
For example, we might be interested in the temperature at $x=12cm$ and $t=24s$.
The coordinates are passed as keyword arguments where the key is the dimension and the value are the coordinates we are interested in:

> **TODO: Really IMPORTANT** -> We need to assure that the units at the coordinates are used correctly. Currently only the magnitude is taken. So using "nm" or "m" instead of "cm" has no effect on the result. I could even use a totally unrelated unit without any problems


In [None]:
gs_discrete(t="24s", x="12cm")

It is not necessary to provide coordinates for all dimensions.
A single dimension is already enough:

In [None]:
gs_discrete(t="24s")

Of cause, we can also evaluate multiple coordinate values for each dimension:

In [None]:
gs_discrete(t=Q_([11, 23], "s"), x=Q_([3, 14, 22], "cm"))

You may have noticed that we exclusively used coordinate values that do not match the coordinates we initially provided to the `GenericSeries`.
The actual data values are obtained by interpolation.
By default, the `GenericSeries` uses linear interpolation.
It can be changed during construction using the `interpolation` parameter or by assigning a new value using the `interpolation` setter:

> TODO: mention interpolation outside of boundaries

In [None]:
gs_discrete.interpolation = "linear"

Let's interpolate the data for $t=15s$ and plot it together with the two closest timesteps:

In [None]:
plt.plot(coords_x.m, np.transpose(gs_discrete(t="15s").data[0].m), label="t=15s")
plt.plot(
    coords_x.m, np.transpose(gs_discrete[1:3].data.m), 
    label=[f"t={v}s" for v in gs_discrete.data_array.t[1:3].data]
)
plt.gca().legend();

As one might expect the linearly interpolated data is the mean value of both curves since $t=15s$ lies directly in the middle between $t=10s$ and $t=20s$.
However, that doesn't really look like the correct temperature distrubution for a single torch moving along the groove.
Instead the peak value should translate from left to right.
Of cause, with dense data from real measurements, this would be just a minor issue with no practical relevance, but it serves as a nice transition two our next topic.

## Using Expressions

### A simple example

Another way to define a `GenericSeries` is using mathematical expressions.
In contrast to the previously shown approach we do not need to generate and store a lot of discrete data.
All we need is a simple formula.
Additionally, we do not get interpolation errors as in the previous section since we can evaluate the expression exactly for any given set of coordinates.

Let's start with a more or less simple example.
The following equation resembles a wave that travels towards increasing $x$ values with increasing time $t$:

$$
f\left(x,t\right)=\mathrm{tanh}\left(\frac{x-t}{5}\right) - \mathrm{tanh}\left(x-t-10\right)
$$

Like in the previous setion, the slope on the right-hand side of the peak is much steeper.
We now translate this equation into a string that can be understood by the `GenericSeries`:

In [None]:
expr = "tanh((x-t)/5) - tanh(x-t-10)"

The syntax is pretty close to python code, except that it is enclosed inside of a string.
Now we could create a `MathematicalExpression` using this expression string and pass it to the `GenericSeries`, but it is much easier to simply pass it directly to the `GenericSeries`:

> TODO Link MathExpr tutorial

In [None]:
gs_expr=GenericSeries(expr)

We have now created a `GenericSeries` based on an expression.
Wasn't that hard, right?
Let's print it and have a look at its representation:

In [None]:
gs_expr

> TODO: fix __repr__

The first item of the output is the expression we entered, but there are also the fields `Parameters`, `Dimensions`, and `Units`.
Our current `GenericSeries` has no parameters (see terminology at the beginning) since we did not define any so far.
The dimensions `x` and `t` were automatically extracted from the provided expression.
The field `Units` refers to the units our quantity after we evaluated the expression.
As you can see, the field is currently empty and we will soon understand why this is the case.
But first, we will evaluate our equation as we did before with the discrete version, except that we will not use units here.
Again, we will talk about this in a few moments:

In [None]:
coords_t = [-5, 5, 15]
coords_x = list(range(25))
result = gs_expr(t=coords_t, x=coords_x)
result

The result is a new `GenericSeries` with discrete values at the coordinates we provided.
Let's create a plot from the data:

In [None]:
plt.plot(
    result.data_array.x,
    np.transpose(result.data.m),
    label=[f"t={v}" for v in result.data_array.t.data]
)
plt.gca().legend();

### Adding parameters

In [None]:
expr_param = "s*(tanh((x-t)/5) - tanh(x-t-10)) + o"

In [None]:
gs_expr_param = GenericSeries(expr_param, parameters=dict(s=450, o=300))
gs_expr_param

In [None]:
result_expr_param = gs_expr_param(t=coords_t, x=coords_x)
plt.plot(
    coords_x,
    np.transpose(result_expr_param.data.m),
    label=[f"t={v}" for v in coords_t]
)
plt.gca().legend();

### Adding units

In [None]:
plt.plot(coords_x.m, gs_expr(x=coords_x.m).data.m)

In [None]:
expr = "tanh((x-t)/5) - tanh((x-t-10))"
gs_expr=GenericSeries(expr)
c = list(range(25))
plt.plot(c, np.transpose(gs_expr(x=c, t=0).data.m))

In [None]:
from weldx.core import GenericSeries, MathematicalExpression, Q_

In [None]:
expr = "a*t + b"

In [None]:
me=MathematicalExpression(expr, parameters={"a":(Q_([10, 1, 1],"m/s"),"c"), "b":(Q_([1,2,3],"m"), "d")})

In [None]:
gs=GenericSeries(me,units={"t":"s"})

In [None]:
gs(dict(t=Q_("3s")))