# 2. Common DataStore functions
Examples of how to do some of the more commonly used functions:

1. mean, min, max, std
2. Selecting
3. Selecting by index
4. Downsample (time dimension)
5. Upsample / Interpolation (length and time dimension)

In [None]:
import os

from dtscalibration import read_silixa_files

First we load the raw measurements into a `DataStore` object, as we learned from the previous notebook.

In [None]:
filepath = os.path.join("..", "..", "tests", "data", "single_ended")

ds = read_silixa_files(directory=filepath, timezone_netcdf="UTC", file_ext="*.xml")

## 0 Access the data
The implemented read routines try to read as much data from the raw DTS files as possible. Usually they would have coordinates (time and space) and Stokes and anti Stokes measurements. We can access the data by key. It is presented as a DataArray. More examples are found at http://xarray.pydata.org/en/stable/indexing.html

In [None]:
ds["st"]  # is the data stored, presented as a DataArray

In [None]:
ds["tmp"].plot(figsize=(12, 8));

## 1 mean, min, max
The first argument is the dimension. The function is taken along that dimension. `dim` can be any dimension (e.g., `time`, `x`). The returned `DataStore` does not contain that dimension anymore.

Normally, you would like to keep the attributes (the informative texts from the loaded files), so set `keep_attrs` to `True`. They don't take any space compared to your Stokes data, so keep them.

Note that also the sections are stored as attribute. If you delete the attributes, you would have to redefine the sections.

In [None]:
ds_min = ds.mean(
    dim="time", keep_attrs=True
)  # take the minimum of all data variables (e.g., Stokes, Temperature) along the time dimension

In [None]:
ds_max = ds.max(
    dim="x", keep_attrs=True
)  # Take the maximum of all data variables (e.g., Stokes, Temperature) along the x dimension

In [None]:
ds_std = ds.std(
    dim="time", keep_attrs=True
)  # Calculate the standard deviation along the time dimension

## 2 Selecting
What if you would like to get the maximum temperature between $x >= 20$ m and $x < 35$ m over time? We first have to select a section along the cable.

In [None]:
section = slice(20.0, 35.0)
section_of_interest = ds.sel(x=section)

In [None]:
section_of_interest_max = section_of_interest.max(dim="x")

What if you would like to have the measurement at approximately $x=20$ m?

In [None]:
point_of_interest = ds.sel(x=20.0, method="nearest")

## 3 Selecting by index
What if you would like to see what the values on the first timestep are? We can use isel (index select) 

In [None]:
section_of_interest = ds.isel(time=slice(0, 2))  # The first two time steps

In [None]:
section_of_interest = ds.isel(x=0)

## 4 Downsample (time dimension)
We currently have measurements at 3 time steps, with 30.001 seconds inbetween. For our next exercise we would like to down sample the measurements to 2 time steps with 47 seconds inbetween. The calculated variances are not valid anymore. We use the function `resample_datastore`.

In [None]:
ds_resampled = ds.resample_datastore(how="mean", time="47S")

## 5 Upsample / Interpolation (length and time dimension)
So we have measurements every 0.12 cm starting at $x=0$ m. What if we would like to change our coordinate system to have a value every 12 cm starting at $x=0.05$ m. We use (linear) interpolation, extrapolation is not supported. The calculated variances are not valid anymore.

In [None]:
x_old = ds.x.data
x_new = x_old[:-1] + 0.05  # no extrapolation
ds_xinterped = ds.interp(coords={"x": x_new})

We can do the same in the time dimension

In [None]:
import numpy as np

time_old = ds.time.data
time_new = time_old + np.timedelta64(10, "s")
ds_tinterped = ds.interp(coords={"time": time_new})