# Verbs

In this notebook we will show what kind of action each action block (i.e. each *verb*) performs. Remember from the [Intro notebook](intro.ipynb) that result instructions are formulated by combining basic building blocks into processing chains. These processing chains start with a reference (e.g. to a semantic concepts) which during query processing is internally evaluated into a *data cube*. For a description of those, see the [References notebook](references.ipynb). Actions are distinct, well-defined data cube operations that can then be applied to these data cubes. Since they are all labelled with an action word that describes their task, we also call them *verbs*.

## Content

- [Verbs for single data cubes](#Verbs-for-single-data-cubes)
    - [Evaluate](#Evaluate)
    - [Extract](#Extract)
    - [Filter](#Filter)
        - [filter_time](#The-filter_time-and-filter_space-verbs)
        - [filter_space](#The-filter_time-and-filter_space-verbs)
    - [Reduce](#Reduce)
    - [Groupby](#Groupby)
        - [groupby_time](#The-groupby_time-and-groupby_space-verbs)
        - [groupby_space](#The-groupby_time-and-groupby_space-verbs)
    - [Label](#Label)
- [Verbs for data cube collections](#Verbs-for-data-cube-collections)
    - [Concatenate](#Concatenate)
    - [Compose](#Compose)
    - [Merge](#Merge)
    - [Evaluating single cube verbs on collections](#Evaluating-single-cube-verbs-on-collections)

## Prepare

Import the semantique package:

In [1]:
import semantique as sq

Import other packages we will use in this notebook:

In [2]:
import geopandas as gpd
import pandas as pd
import numpy as np
import xarray as xr
import json

Create the components for query processing. See the [Intro notebook](intro.ipynb) for details.

In [3]:
# Ontology.
with open("files/ontology.json", "r") as file:
    ontology = sq.ontology.Semantique(json.load(file))

# Factbase.
with open("files/factbase.json", "r") as file:
    factbase = sq.factbase.GeotiffArchive(json.load(file), src = "files/resources.zip")

# Extent.
space = sq.SpatialExtent(gpd.read_file("files/footprint.geojson"))
time = sq.TemporalExtent("2019-01-01", "2020-12-31")

# Additional configuration.
config = {"crs": 3035, "tz": "UTC", "spatial_resolution": [-1800, 1800]}

## Verbs for single data cubes

Most verbs in semantique are verbs that apply an action to a single data cube. The currently implemented verbs in this category are:

- [Evaluate](#Evaluate): Evaluates an expression for each pixel in a data cube.
- [Extract](#Extract): Extracts dimension coordinates of a specified dimension in a data cube.
- [Filter](#Filter): Filters values from a data cube.
- [Reduce](#Reduce): Reduces sets of values along a specified dimension to one and subsequently removes that dimension.
- [Groupby](#Groupby): Groups a data cube into a collection of multiple subsets.
- [Label](#Label): Assigns a label to a data cube.

### Evaluate

The evaluate verb evaluates an expression for each pixel in the input data cube. These expressions can take many different forms, but each of them accepts a pixel value of the input cube as input, and applies some operator to it. The result of that operation is the new value of that particular pixel in the output cube. That is, the output cube always has the *same shape* as the input cube. Below, we will show different forms of expressions, and the built-in [operators](https://zgis.github.io/semantique/reference.html#operator-functions) that semantique offers for them. It is also possible to use custom operators that you define for yourself, but we will not handle that here. See the [Advanced usage notebook](https://zgis.github.io/semantique/_notebooks/advanced.html#Adding-custom-operators) instead.

When using the evaluate verb, it is important to be aware of the value type of your input cube(s). For example, they may contain nominal, ordinal, binary or numerical data. Different operators may only support specific (combinations of) value types. For details, see the [Advanced usage notebook](https://zgis.github.io/semantique/_notebooks/advanced.html#Tracking-value-types).

#### Simple univariate expressions

The simplest form of expressions are the univariate expressions without any additional constants or parameters. For each pixel value $x_{i}$ in input cube $X$, these expressions *only* consider $x_{i}$, and apply an operator to it. That is, the expression is of the following form.

$$
expression = operator(x_{i})
$$

This means that the output cube has the same shape as the input cube, and that each pixel value in the output cube is the result of the univariate expression evaluated on the value of the corresponding pixel in the input cube.

![evaluate_univariate](figures/evaluate_uni.png)

The built-in operators for this purpose include the **numerical univariate operators**, which are intended for usage on numerical data cubes:

- `absolute`: Computes the absolute value of $x_{i}$.
- `exponential`: Computes the exponential value of $x_{i}$, i.e. $e^{x_{i}}$.
- `natural_logarithm`: Computes the natural logarithm of $x_{i}$.
- `square_root`: Computes the square root of $x_{i}$.
- `cube_root`: Computes the cube root of $x_{i}$.

Next to those there are the **boolean univariate operators**, which are intended for usage on binary data cubes:

- `invert`: Returns $1$ if $x_{i} = 0$, and $0$ if $x_{i} \neq 0$.

For example, applying the `invert` operator to the water cube marks all non-water pixels as 1 and all water pixels as 0.

In [4]:
recipe = sq.QueryRecipe()

In [5]:
recipe["water"] = sq.entity("water")
recipe["not_water"] = sq.result("water").evaluate("invert")

In [6]:
out = recipe.execute(factbase, ontology, space, time, **config)

In [7]:
out["water"]

In [8]:
out["not_water"]

#### Univariate expressions involving constants

Other expressions involve a constant value in addition to the pixel values from the input cube. Such expressions consist of a right-hand side value, a left-hand side value and an operator that in some way combines these values. For each pixel value $x_{i}$ in input cube $X$, the right-hand side value of the expression is $x_{i}$, and the left-hand side value of the expression is a constant value $y$. Hence, $y$ remains the same no matter for which pixel in $X$ the expression is evaluated. For example, we could add 1 to each pixel value, or multiply each pixel value by 2. 

$$
expression = x_{i} \; operator \; y
$$

This means that the output cube has the same shape as the input cube, and that each pixel value in the output cube is the result of the expression evaluated on the value of the corresponding pixel in the input cube, with each expression containing some constant value $y$ which is the same for each pixel.

![evaluate_constant](figures/evaluate_const.png)

The built-in operators for this purpose can be subdivided into different categories. The **algebraic operators** are intended for usage on numerical data cubes and perform an operation of arithmetic:

- `add`: Adds some constant $y$ to each pixel value $x_{i}$, e.g. $x_{i} + 2$ when $y = 2$.
- `subtract`: Subtracts some constant $y$ from each pixel value $x_{i}$, e.g. $x_{i} - 2$ when $y = 2$.
- `multiply`: Multiplies each pixel value $x_{i}$ by some constant $y$, e.g. $x_{i} \times 2$ when $y = 2$.
- `divide`: Divides each pixel value $x_{i}$ by some constant $y$, e.g. $\frac{x_{i}}{2}$ when $y = 2$.
- `power`: Raises each pixel value $x_{i}$ to the *y*th power for some constant $y$, e.g. $x_{i} ^ 2$ when $y = 2$.

The **relational operators** evaluate a condition involving two values. The result is always a boolean value, i.e. true (1) when the condition holds and false (2) when it doesn't. Some of the conditions test for equality, and hence can be used on any data cube as long as the constant $y$ matches the value type of the data cube:

- `equal`: Returns $1$ if $x_{i} = y$, and $0$ otherwise.
- `not_equal`: Returns $1$ if $x_{i} \neq y$, and $0$ otherwise.

The other conditions imply a fixed order among the values, and hence should not be used on nominal data cubes:

- `greater`: Returns $1$ if $x_{i} > y$, and $0$ otherwise.
- `less`: Returns $1$ if $x_{i} < y$, and $0$ otherwise.
- `greater_equal`: Returns $1$ if $x_{i} \geq y$, and $0$ otherwise.
- `less_equal`:Returns $1$ if $x_{i} \leq y$, and $0$ otherwise.

In [9]:
recipe = sq.QueryRecipe()

In [10]:
recipe["greenness"] = sq.appearance("greenness")
recipe["twice_greenness"] = sq.result("greenness").evaluate("multiply", 2)
recipe["high_greenness"] = sq.result("greenness").evaluate("greater", 2)

In [11]:
out = recipe.execute(factbase, ontology, space, time, **config)

In [12]:
out["greenness"].round(2)

In [13]:
out["twice_greenness"].round(2)

In [14]:
out["high_greenness"]

#### Bivariate expressions

In the examples above $y$ remained the same no matter for which pixel in $X$ the expression was evaluated. Instead, $y$ could be a variable as well. That turns the same expressions into *bivariate expressions*. In practice this means that when evaluating the expression for the pixel with value $x_{i} \in X$, we set $y$ to be equal to pixel value $y_{i}$ taken from another data cube $Y$.

$$
expression = x_{i} \; operator \; y_{i}
$$

This means that the output cube has the same shape as the input cube, and that each pixel value in the output cube is the result of the bivariate expression evaluated on the value of the corresponding pixel in the input cube *and* the value of that pixel with the same coordinates in another data cube $Y$. Note that this does require that $Y$ has the *same shape* as $X$ (or at least can be aligned to that shape, which we will explain in the next sub-section). Only then, each pixel in $X$ has a *matching* pixel in $Y$.

![evaluate_multivariate](figures/evaluate_multi.png)

We can use the same **algebraic operators** in these cases. These are meant to be used in expression in which both sides are numerical data cubes:

- `add`: Adds pixel value $y_{i} \in Y$ to each pixel value $x_{i} \in X$..
- `subtract`: Subtracts pixel value $y_{i} \in Y$ from each pixel value $x_{i} \in X$.
- `multiply`: Multiplies each pixel value $x_{i} \in X$ by pixel value $y_{i} \in Y$.
- `divide`: Divides each pixel value $x_{i} \in X$ by pixel value $y_{i} \in Y$.
- `power`: Raises each pixel value $x_{i} \in X$ to the *y*th power, where $y = y_{i} \in Y$.

We can also use the **relational operators**. Those that test for equality can be used on any data cube as long as the value type of the second data cube is the same as the value type of the first data cube:

- `equal`: Returns $1$ if $x_{i} = y_{i}$, and $0$ otherwise.
- `not_equal`: Returns $1$ if $x_{i} \neq y_{i}$, and $0$ otherwise.

Those relational operators that imply a fixed order in the values of the cube can be used on any non-nominal data cube as long as the value type of the second data cube is the same as the value type of the first data cube:

- `greater`: Returns $1$ if $x_{i} > y_{i}$, and $0$ otherwise.
- `less`: Returns $1$ if $x_{i} < y_{i}$, and $0$ otherwise.
- `greater_equal`: Returns $1$ if $x_{i} \geq y_{i}$, and $0$ otherwise.
- `less_equal`:Returns $1$ if $x_{i} \leq y_{i}$, and $0$ otherwise.

There is an additional set of operators we can use for bivariate expressions. These are the **boolean operators**. The are intended to be used in expressions involving two binary data cubes.

- `and`: Returns $1$ when both $x_{i} \neq 0$ *and* $y_{i} \neq 0$, and $0$ otherwise.
- `or`: Returns $1$ when $x_{i} \neq 0$, $y_{i} \neq 0$, or both, and $0$ otherwise.
- `exclusive_or`: Returns $1$ when either $x_{i} \neq 0$ *or*  $y_{i} \neq 0$, but *not* both, and $0$ otherwise.

In [15]:
recipe = sq.QueryRecipe()

In [16]:
recipe["water"] = sq.entity("water")
recipe["vegetation"] = sq.entity("vegetation")
recipe["water_or_vegetation"] = sq.result("water").evaluate("or", sq.result("vegetation"))

In [17]:
out = recipe.execute(factbase, ontology, space, time, **config)

In [18]:
out["water"]

In [19]:
out["vegetation"]

In [20]:
out["water_or_vegetation"]

As mentioned above, cube $Y$ does not necessarily have to be of the same shape as input cube $X$, but it should at least be possible to *align* it to that shape. This can be done in two ways.

First consider the case where $Y$ has the same dimensions as $X$, but not all coordinate values of $X$ are present in $Y$. In that case, we can align $Y$ with $X$ such that pixel values at position $i$ in both cubes, i.e. $x_{i}$ and $y_{i}$ respectively, belong to pixels with the *same coordinates*. If $y_{i}$ was not originally part of $Y$, we assign it a nodata value. This also works vice-versa, with coordinate values in $Y$ that are not present in $X$.

![evaluate_align_same_dims](figures/evaluate_align1.png)

Secondly, consider a case where $Y$ has one or more dimensions with exactly the same coordinate values as $X$, but does not have *all* the dimensions that $X$ has. In that case, we can align $Y$ with $X$ by duplicating its values along those dimensions that are missing. This does *not* work vice versa. When cube $Y$ has more dimensions that cube $X$, there is no clear way to define how to subset the values in $Y$ to match the shape of $X$.

![evaluate_align_missing_dims](figures/evaluate_align2.png)

Alignment is something you have to be aware of to understand how bivariate expressions are evaluated. However, it is not something you have to do for yourself. Internally, the query processor takes care of it. See the [Advanced usage notebook](advanced.ipynb#Aligning-cubes-to-each-other) for details on that.

#### Special operators

Next to the operators already mentioned, there are some additional operators with a special behaviour.

##### The in and not_in operators

The `in` and `not_in` operators are special types of relational operators that test for set membership. That is, they test for each pixel value $x_{i}$ in cube $X$ if it is or is not member of a set of values $Y$. 

- `in`: Returns $1$ if $x_{i} \in Y$, and $0$ otherwise. Here, $Y$ is a finite set of values that remains constant for each $x_{i}$.
- `not_in`: Returns $1$ if $x_{i} \notin Y$, and $0$ otherwise. Here, $Y$ is a finite set of values that remains constant for each $x_{i}$.|

In [21]:
recipe = sq.QueryRecipe()

In [22]:
recipe["colors"] = sq.appearance("Color type")
recipe["water"] = sq.result("colors").evaluate("in", [21, 22, 23, 24])

In [23]:
out = recipe.execute(factbase, ontology, space, time, **config)

In [24]:
out["colors"]

In [25]:
out["water"]

When the values in $X$ have value labels defined (i.e. when category indices are mapped to category labels), you can also use the labels as set members, instead of the indices. For that, use the [value_label()](https://zgis.github.io/semantique/_generated/semantique.value_label.html) function:

In [26]:
labels = [sq.value_label("DPWASH"), sq.value_label("SLWASH"), sq.value_label("TWASH"), sq.value_label("SASLWA")]
recipe["water"] = sq.result("colors").evaluate("in", labels)

In [27]:
out = recipe.execute(factbase, ontology, space, time, **config)

In [28]:
out["water"]

To make your lives easier, semantique also includes a [value_range()](https://zgis.github.io/semantique/_generated/semantique.value_label.html) function. This functions allows you to specify a range of values by only providing the start and the end of the range. The range is handled as a closed interval, meaning that both the start and end are included. Do note that value ranges are only supported for numerical or ordinal data.

In [29]:
recipe["water"] = sq.result("colors").evaluate("in", sq.value_range(21, 24))

In [30]:
out = recipe.execute(factbase, ontology, space, time, **config)

In [31]:
out["water"]

$Y$ may be another data cube, but in that case it will be treated as being a set. That is, for each pixel value $x_{i} \in X$ it will be tested if it occurs anywhere in $Y$.

In [32]:
recipe = sq.QueryRecipe()

In [33]:
recipe["foo"] = sq.entity("water").evaluate("in", sq.entity("vegetation"))

In [34]:
out = recipe.execute(factbase, ontology, space, time, **config)

In [35]:
out["foo"]

##### The assignment operator

The `assign` operator assigns each pixel $x_{i}$ of data cube $X$ a new value that has is not a function of the original value. The new value can either be some constant $y$, or the corresponding pixel value $y_{i}$ in another data cube $Y$.

A trivial example (not that nodata values in $X$ are always preserved, they do *not* get replaced):

In [36]:
recipe = sq.QueryRecipe()

In [37]:
recipe["greenness"] = sq.appearance("greenness")
recipe["zeroes"] = sq.result("greenness").evaluate("assign", 0)

In [38]:
out = recipe.execute(factbase, ontology, space, time, **config)

In [39]:
out["greenness"].round(2)

In [40]:
out["zeroes"]

A more useful example follows from the alignment behaviour as explained in the previous section. This allows you to for example to replace all pixel values in a data cube by the timestamp (or a component of the timestamp, e.g. the year) at which they were observed. Timestamps and its components can be extracted with the [extract()](#Extract) verb.

In [41]:
recipe["years"] = sq.result("greenness").evaluate("assign", sq.self().extract("time", "year"))

In [42]:
out = recipe.execute(factbase, ontology, space, time, **config)

In [43]:
out["years"]

##### The temporal relational operators

The temporal relational operators are comparison operators specifically designed to deal with *time instants* and *time intervals* as operand values. Hence, it requires each pixel value $x_{i}$ in input cube $X$ to be a time instant. The right-hand side of the expression can either be a single time instant, a time interval (i.e. a list of two time instant representing the start and end of the interval) or another data cube $Y$ in which each pixel value $y_{i}$ is a time instant. The latter will in practice be evaluated as being a time interval with the earliest time instant being the start of the interval, and the latest time instant the end of the interval.

The currently implemented temporal relational operators are:

- `after`: When $y$ is a time instant: returns $1$ if $x_{i} > y$, and $0$ otherwise. When $y$ is a time interval: returns $1$ if $x_{i} > max(y)$, and $0$ otherwise.
- `before`: When $y$ is a time instant: returns $1$ if $x_{i} < y$, and $0$ otherwise. When $y$ is a time interval: returns $1$ if $x_{i} < min(y)$, and $0$ otherwise.
- `during`: Returns $1$ if $min(y) \leq x_{i} \leq max(y)$, and $0$ otherwise. Only intended for time intervals as $y$.

To construct time instants and time intervals for usage as right-hand operand, you can use the [time_instant()](https://zgis.github.io/semantique/_generated/semantique.time_instant.html) and [time_interval()](https://zgis.github.io/semantique/_generated/semantique.time_interval.html) functions that semantique offers. The first expects a single datetime value, while the latter expects two datetime values (i.e. the start and end of the interval, with the interval being closed at both sides). You can provide datetimes in formats as "2020-12-31" or "2020/12/31", but also complete ISO8601 timestamps such as "2020-12-31T14:37:22". As long as the [Timestamp](https://pandas.pydata.org/docs/reference/api/pandas.Timestamp.html) initializer of the [pandas](https://pandas.pydata.org/) package can understand it, it is supported by semantique. Any additional keyword arguments will be forwarded to this initializer.

In [44]:
recipe = sq.QueryRecipe()

In [45]:
recipe["times"] = sq.entity("water").extract("time")
recipe["early"] = sq.result("times").evaluate("before", sq.time_instant("2019-12-31"))
recipe["late"] = sq.result("times").evaluate("during", sq.time_interval("2020-01-01", "2020-12-31"))

In [46]:
out = recipe.execute(factbase, ontology, space, time, **config)

In [47]:
out["early"]

In [48]:
out["late"]

##### The spatial relational operators

The spatial relational operators are comparison operators specifically designed to deal with *geometries* as operand values. It requires each pixel value $x_{i}$ in input cube $X$ to be a tuple of spatial (x,y) coordinates. The right-hand side of the expression can either be one or more spatial geometries, or another data cube $Y$ in which each pixel value $y_{i}$ is a coordinate tuple. The latter will in practice be evaluated as being a geometry covering the spatial bounding box of the cube.

The currently implemented spatial relational operators are:

- `intersects`: Returns $1$ if the spatial point with the coordinates of $x_{i}$ spatially intersects with geometry $y$, and $0$ otherwise.

To construct spatial geometries for usage as right-hand operand, you can use the [geometries()](https://zgis.github.io/semantique/_generated/semantique.geometries.html) function that semantique offers. It expects an object that can be read by the [GeoDataFrame](https://geopandas.org/docs/reference/api/geopandas.GeoDataFrame.html) initializer of the [geopandas](https://geopandas.org/en/stable/) package. Any additional keyword arguments will be forwarded to this initializer. In practice, this means you can read any GDAL-supported file format with [geopandas.read_file()](https://geopandas.org/en/stable/docs/reference/api/geopandas.read_file.html), and then use that object to create spatial geometries.

In [49]:
recipe = sq.QueryRecipe()

In [50]:
parcels = gpd.read_file("files/parcels.geojson")
recipe["coords"] = sq.entity("water").extract("space")
recipe["in_parcel"] = sq.result("coords").evaluate("intersects", sq.geometries(parcels))

In [51]:
out = recipe.execute(factbase, ontology, space, time, **config)

In [52]:
out["in_parcel"]

### Extract

The extract verbs extracts the coordinates of a specified dimension from a data cube.

![extract](figures/extract.png)

In [53]:
recipe = sq.QueryRecipe()

In [54]:
recipe["time"] = sq.entity("water").extract("time")
recipe["space"] = sq.entity("water").extract("space")

In [55]:
out = recipe.execute(factbase, ontology, space, time, **config)

In [56]:
out["time"]

In [57]:
out["space"]

Coordinate values of some dimensions may consist of multiple components. For example, the spatial dimension contains coordinate tuples that consist of x and y coordinates, and the time dimension contains timestamps that consist of a year, a month, a day, an hour, etc. If you are only interested in a single component of a dimension, you can specify that through the second, optional argument of the extract verb.

Extracting coordinates of a (component of a) dimension is useful for example when you want to filter a data cube based on its dimension coordinates rather than based on its values (e.g. [temporal filtering](#Aligning-the-filterer)), when you want to [replace](#Replace) pixel values by their time of observation, or when you want to split a data cube into multiple groups along a dimension (e.g. [grouping by year](#Groupby)).

In [58]:
recipe = sq.QueryRecipe()

In [59]:
recipe["years"] = sq.entity("water").extract("time", "year")
recipe["xcoords"] = sq.entity("water").extract("space", "x")

In [60]:
out = recipe.execute(factbase, ontology, space, time, **config)

In [61]:
out["years"]

In [62]:
out["xcoords"]

### Filter

The filter verb filters values from a data cube. That is, the output cube is a *subset* of the input cube. Which values in the input cube are kept, and which are removed, is defined by a second, binary data cube which we call the *filterer*. The filterer should have the same shape as the input cube, such that each pixel value $x_{i}$ in input cube $X$ has a *matching* pixel value $y_{i}$ in filterer $Y$. Then:

- $x_{i}$ is kept if $y_{i} \neq 0$.
- $x_{i}$ is removed if $y_{i} = 0$.

![filter](figures/filter.png)

For example, we may filter only those pixels in our water cube that where *not* covered by clouds.

In [63]:
recipe = sq.QueryRecipe()

In [64]:
recipe["water"] = sq.entity("water")
recipe["cloud"] = sq.entity("cloud")
recipe["not_cloud"] = sq.result("cloud").evaluate("invert")
recipe["masked"] = sq.result("water").filter(sq.result("not_cloud"))

In [65]:
out = recipe.execute(factbase, ontology, space, time, **config)

In [66]:
out["water"]

In [67]:
out["not_cloud"]

In [68]:
out["masked"]

We could also filter our vegetation verb to keep only those pixel values with a greenness > 1.5:

In [69]:
recipe = sq.QueryRecipe()

In [70]:
recipe["vegetation"] = sq.entity("vegetation")
recipe["greenness"] = sq.appearance("greenness")
recipe["green_vegetation"] = sq.result("vegetation").filter(sq.result("greenness").evaluate("greater", 1.5))

In [71]:
out = recipe.execute(factbase, ontology, space, time, **config)

In [72]:
out["vegetation"]

In [73]:
out["greenness"].round(2)

In [74]:
out["green_vegetation"]

#### Aligning the filterer

Just as with the $Y$ and $X$ cubes in [bivariate expressions](#Bivariate-expressions) in the [evaluate()](#Evaluate) verb, the filterer does not have to be of the same shape as the input cube, as long as we can align it to that shape. That means that we can also filter the input cube by the coordinates of one of its dimensions. All we have to do is to construct a filterer for that dimension. Hence, a boolean, one-dimensional data cube that specifies for each of the coordinate values of that dimension if it should be kept (i.e. 1) or removed (i.e. 0).

![filter_align](figures/filter_align.png)

For example, when we only want to keep pixel values observed in 2020:

In [75]:
recipe = sq.QueryRecipe()

In [76]:
recipe["2020"] = sq.entity("water").filter(sq.self().extract("time", "year").evaluate("equal", 2020))

In [77]:
out = recipe.execute(factbase, ontology, space, time, **config)

In [78]:
out["2020"]

#### The filter_time and filter_space verbs
You can also use a handy shortcut for the above formulation: the filter_time verb. This verb allows you to apply a temporal filter *without* having to explicitly extract the time coordinates from the active evaluation object and evaluating a comparison expression on them.

The filter_time verb is only a "shortcut" verb, not an independent verb on its own. This means that when calling the filter_time verb, it is internally translated into a textual query recipe containing the self reference and the extract and evaluate verbs instead. In the same way, you can also use the shortcut verb filter_space for spatial filters.

In [79]:
recipe["2020"] = sq.entity("water").filter_time("year", "equal", 2020)

In [80]:
out = recipe.execute(factbase, ontology, space, time, **config)

In [81]:
out["2020"]

#### Self-filtering

A special type of a filtering operation is self-filtering, i.e. filtering a data cube by itself. In this case, the input cube should be boolean. In the output, the "true" values will be preserved, while the "false" values are removed.

![filter_self](figures/filter_self.png)

In [82]:
recipe = sq.QueryRecipe()

In [83]:
recipe["vegetation"] = sq.entity("vegetation")
recipe["true_vegetation"] = sq.result("vegetation").filter(sq.self())

In [84]:
out = recipe.execute(factbase, ontology, space, time, **config)

In [85]:
out["vegetation"]

In [86]:
out["true_vegetation"]

### Reduce

The reduce verb applies a reducer function along a dimension and subsequently drops the reduced dimension. That is, the output cube always has one dimension less than the input cube.

To reduce a dimension, the reducer function operates on each slice of values along the axis of the dimension. Such a *slice* contains one value for each coordinate label of the dimension to reduce over, while the coordinate labels of all other dimensions are *constant* within each slice. The reducer function always returns a single value, such that each slice gets reduced from $n$ values to one value.

For example: a data cube with a spatial and a temporal dimension contains for each location in space $n$ values, where $n$ is the number of timestamps in the temporal dimension. When we reduce the temporal dimension of this cube, the reducer function reduces these $n$ values for each location in space to one value. The resulting cube has a single value per location in space, and no temporal dimension anymore.

![reduce](figures/reduce.png)

The are many different types of reducers available in semantique. It is also possible to use custom reducers that you define for yourself, but we will not handle that here. See the [Advanced usage notebook](https://zgis.github.io/semantique/_notebooks/advanced.html#Adding-custom-reducers) instead.

- `mean`: Returns the average value of each slice $S$.
- `median`: Returns the median value of each slice $S$.
- `mode`: Returns the most occuring value in each slice $S$.
- `max`: Returns the largest value in each slice $S$.
- `min`: Returns the smallest value in each slice $S$.
- `range`: Returns the difference between the largest and smallest value in each slice $S$.
- `n`: Returns the number of observations in each slice $S$.
- `product`: Returns the product of the values in each slice $S$.
- `sum`: Returns the sum of the values in each slice $S$.
- `standard_deviation`: Returns the standard deviation of the values in each slice $S$.
- `variance`: Returns the variance of the values in each slice $S$.
- `all`: For each slice $S$, returns $1$ if all $x_{i} \in S \neq 0$, and $0$ otherwise.
- `any`: For each slice $S$, returns $1$ if any $x_{i} \in S \neq 0$, and $0$ otherwise.
- `count`: Counts the number of non-zero values in each slice $S$.
- `percentage`: Calculates the percentage of non-zero values in each slice $S$.
- `first`: Returns the first value of each slice $S$.
- `last`: Returns the last value of each slice $S$.

It is important to mention that nodata values are **ignored** by the reducer functions! That is, for example, when a slice has the values `[1, 1, nan, 1]` the `all` reducer will return `1` and the `percentage` reducer will return `100`. A reducer will only return a nodata value when *all* values in a slice are nodata values.

Also, when using the reduce verb, it is important to be aware of the value type of your input cube. For example, it may contain nominal, ordinal, binary or numerical data. Different reducer functions may only support specific value types. For details, see the [Advanced usage notebook](https://zgis.github.io/semantique/_notebooks/advanced.html#Tracking-value-types).

Having said that, lets show some examples. The reduce verb takes as first argument the name of the dimension to be reduced over, and as second argument the reducer function to be applied.

In [87]:
recipe = sq.QueryRecipe()

In [88]:
recipe["vegetation"] = sq.entity("vegetation")
recipe["count_map"] = sq.result("vegetation").reduce("time", "count")
recipe["percentage_ts"] = sq.result("vegetation").reduce("space", "percentage")
recipe["occurence_map"] = sq.result("vegetation").reduce("time", "any")

In [89]:
out = recipe.execute(factbase, ontology, space, time, **config)

In [90]:
out["vegetation"]

In [91]:
out["count_map"]

In [92]:
out["occurence_map"]

In [93]:
out["percentage_ts"].round(1)

### Groupby

The groupby verb splits a data cube into multiple smaller data cubes, called groups. That is, the output cube is a collection of multiple subsets of the input cube. Grouping is always done *along* a dimension. That means that first the coordinate labels of this dimension are divided into distinct groups. Then, the input cube is split such that for each of these groups there is a subset of the input cube containing all pixels that have a coordinate for the given dimension which matches a label in the group. How the coordinate labels are grouped is defined by a second, categorical data cube which we call the *grouper*. The grouper should be a one-dimensional data cube with a dimension that *matches* an existing dimension in the input cube. Then, coordinate labels $\theta_{i}$ and $\theta_{j}$ of grouper $Y$ are in the same group if and only if $y_{i} = y_{j}$.

![groupby_single](figures/groupby_single.png)

For example, we may group the water cube along the time dimension such that pixels observed in different seasons end up in different subsets. The result of this operation is a collection of multiple data cubes. These data cube collections object have specific verbs to combine their elements into a single data cube again. For a description of those, see [the next section](#Verbs-for-data-cube-collections).

In [94]:
recipe = sq.QueryRecipe()

In [95]:
recipe["seasons"] = sq.entity("water").groupby(sq.self().extract("time", "season"))

In [96]:
out = recipe.execute(factbase, ontology, space, time, **config)

In [97]:
out["seasons"][0]

In [98]:
out["seasons"][1]

#### The groupby_time and groupby_space verbs

You can also use a handy shortcut for the above formulation: the groupby_time verb. This verb allows you to group along the temporal dimension *without* having to explicitly extract the time coordinates from the active evaluation object.

The groupby_time verb is only a "shortcut" verb, not an independent verb on its own. This means that when calling the groupby_time verb, it is internally translated into a textual query recipe containing the self reference and the extract verb instead. In the same way, you can also use the shortcut verb groupby_space for grouping directly along the spatial dimension.

In [99]:
recipe["seasons"] = sq.entity("water").groupby_time("season")

In [100]:
out = recipe.execute(factbase, ontology, space, time, **config)

In [101]:
out["seasons"][0]

In [102]:
out["seasons"][1]

#### Multiple groupers

It is also possible to provide a collection of grouper cubes to the groupby verb, as long as their dimensions match. In that case, groups of coordinate labels are formed as follows: given grouper $Y$ and grouper $Z$ with matching coordinates, coordinate labels $\theta_{i}$ and $\theta_{j}$ are in the same group if and only if $y_{i} = y_{j}$ *and* $z_{i} = z_{j}$.

![groupby_multi](figures/groupby_multi.png)

That means for example that we can group the input cube along the time dimension in a way that two pixels observed in the same month but a different year end up in a different subset.

In [103]:
recipe = sq.QueryRecipe()

In [104]:
multigrouper = sq.collection(sq.self().extract("time", "year"), sq.self().extract("time", "month"))
recipe["groups"] = sq.entity("water").groupby(multigrouper)

In [105]:
out = recipe.execute(factbase, ontology, space, time, **config)

In [106]:
out["groups"][0]

In [107]:
out["groups"][1]

In [108]:
out["groups"][2]

A shorter formulation for the above statement would be:

```python
sq.entity("water").groupby_time(["year", "season"])
```

### Label

One additional verb is the label verb. This verb does not perform any analytical task, but simply assigns a label to the input cube. In some cases this can be helpful. For example, when concatenating multiple cubes together along a new dimension (see [here](#Concatenate)), the labels of these cubes will be used as coordinate labels of this new dimension.

## Verbs for data cube collections

When constructing a query recipe, you can start a processing chain with a reference to a [collection of data cubes](https://zgis.github.io/semantique/_notebooks/advanced.html#Referencing-collections-of-data-cubes). The data cube collections have specific verbs that all in some way combine the elements of the collections back into a single data cube. The currently implemented verbs in this category are:

- [Concatenate](#Concatenate): Concatenates multiple data cubes over a new or an existing dimension.
- [Compose](#Compose): Creates a categorical composition of multiple boolean data cubes.
- [Merge](#Merge): For each pixel, merges its values in multiple data cubes into one by applying a reducer function.

It is important to mention that the verbs are intended for a collection of cubes that all have the *same* dimensions (but not necessarily the same coordinates)! They will also work on a collection of cubes that do not have the same dimensions, as long as they can all be aligned to each other. However, in these cases you should be aware of the pecularities of alignment, see the section on [bivariate expressions](#Bivariate-expressions) for details. To summarize: When the cubes in the collection have the same dimensions, but don't share all of their coordinate labels, they get aligned to each other by filling the missing pixels in either of them with nodata values. When one of the cubes in the collection (e.g. $C_{1}$) is missing a dimension that is present in another cube in the collection (e.g. $C_{2}$), they get aligned to each other by duplicating the values of $C_{1}$ for each coordinate of the missing dimension. In any case, the coordinates of the output cube are always the *union* of all coordinates from the input cubes. Only when the input cubes can in no way be aligned to each other, the verb will throw an error.

### Concatenate

The concatenate verb concatenates multiple data cubes along a given dimension. There are two main ways in which you can do this: either you concatenate along a *new* dimension, or you concatenate along an *existing* dimension.

#### Concatenating along a new dimension

Concatenating multiple data cubes along a new dimension is a relatively simple process. Each of the input cubes becomes a dimension in the output cube. Lets consider a collection with two two-dimensional data cubes $A$ and $B$ that have matching coordinates along the dimensions $\Gamma$ and $\Delta$. Concatenating them along a new dimension $E$ will result in a new three-dimensional cube $C$ with dimensions $\Gamma$, $\Delta$ and $E$. A pixel with coordinates $(\gamma_{i}, \delta_{i})$ in cube $A$ becomes a pixel with coordinates $(\gamma_{i}, \delta_{i}, \epsilon = A)$ in cube $C$, while the pixel with the same coordinates $(\gamma_{i}, \delta_{i})$ in cube $B$ becomes a pixel with coordinates $(\gamma_{i}, \delta_{i}, \epsilon = B)$ in cube $C$.

![concatenate_new](figures/concat_new.png)

All you have to provide to the concatenate verb is the name of the new dimension. Be careful with using "time" and "space" as names for your new dimension. The dimension name "time" is reserved by for a temporal dimension with coordinate labels that are datetime objects, while the dimension name "space" is reserved for a spatial dimension with coordinate labels that are (x,y) coordinate pairs. Also, the names "x" and "y", "longitude" and "latitude" and "lon" and "lat" are generally used for the individual coordinate dimensions that make up the stacked "space" dimension.

The coordinate labels of the new dimension will be the names of the input cubes.

In [109]:
recipe = sq.QueryRecipe()

In [110]:
recipe["concepts"] = sq.collection(sq.entity("water"), sq.entity("snow"), sq.entity("vegetation")).\
    concatenate("concept")

In [111]:
out = recipe.execute(factbase, ontology, space, time, **config)

In [112]:
out["concepts"]

#### Concatenating over an existing dimension

Concatenating over an existing dimension is mainly meant for cases where each of the input cubes has different coordinate labels for that dimension. For example, we have one cube with a time dimension containing dates in 2019, and another one with a time dimension containing dates in 2020. Then, concatenating them over the time dimension gives us a single cube with a time dimension containing both the dates from 2019 and 2020.

![concatenate_existing](figures/concat_existing.png)

In [113]:
recipe = sq.QueryRecipe()

In [114]:
recipe["water"] = sq.entity("water").\
    groupby_time("year").\
    concatenate("time")

In [115]:
out = recipe.execute(factbase, ontology, space, time, **config)

In [116]:
out["water"]

We can also concatenate cubes that share coordinate labels of the dimension to concatenate over. However, for these coordinates, only the values of the *first* cube in the collection that contains that coordinate, will end up in the output cube. For the others, these values will simply be dropped.

### Compose

The compose verb is primarily meant for collections of binary data cubes, i.e. data cubes that only have "true" (i.e. 1) and "false" (i.e. 0) values. Then, a pixel in the output cube gets a value of 1 when it was "true" in the first cube of the collection, a value of 2 of it was "true" in the second cube of the collection, a value of 3 if it was "true" in the third cube of the collection, et cetera. Hence, with the compose verb you convert a set of boolean data cubes into one categorical data cube.

When a pixel is "true" in more than one cube in the collection, it gets the index of that cube that comes first in the collection. Hence, if a pixel is "true" in both the second and third cube in a collection, it gets a value of 2 in the output cube. When a pixel is not "true" for any of the cubes in the collection, it gets a nodata value in the output cube.

![compose](figures/compose.png)

In [117]:
recipe = sq.QueryRecipe()

In [118]:
recipe["land_cover"] = sq.collection(sq.entity("water"), sq.entity("snow"), sq.entity("vegetation")).\
    compose()

In [119]:
out = recipe.execute(factbase, ontology, space, time, **config)

In [120]:
out["land_cover"]

### Merge

The merge verb is actually a combination of two other verbs. First, it [concatenates](#Concatenate) the cubes in the collection along a new dimension, and then it [reduces](#Reduce) the output of that over this new dimension. In practice, that means that the merge verb applies a reduction function to each set of values belonging to the same pixel, but coming from different elements in the cube collection. For example, if we merge the water, snow and vegetation cube using the `any` reducer, we get an output cube that contains a "true" value (i.e. 1) for a pixel if the value of that pixel was "true" in at least one of the water, snow or vegetation cubes, and a "false" value (i.e. 0) if the value of that pixel was not "true" in any of those.

![merge](figures/evaluate_multi.png)

The only argument you need to provide to the verb is the reducer function. See the [reduce()](#Reduce) verb for an overview of them.

In [121]:
recipe = sq.QueryRecipe()

In [122]:
recipe["any_concept"] = sq.collection(sq.entity("water"), sq.entity("snow"), sq.entity("vegetation")).\
    merge("any")

In [123]:
out = recipe.execute(factbase, ontology, space, time, **config)

In [124]:
out["any_concept"]

Note that the process of merging a collection of two cubes usually can be modelled as well with the [evaluate()](#Evaluate) verb. For example, the following lines produce identical results:

```python
sq.collection(A, B).merge("any")
A.evaluate("or", B)
```

However, where the evaluate verb can "merge" one other cube into a given input cube, the merge verb allows to combine an unrestricted number of cubes in one go. Because of that, only [commutative operations](https://www.mathwords.com/c/commutative.htm) are supported in the merge verb.

### Evaluating single cube verbs on collections

All [verbs for single data cubes](#Verbs-for-single-data-cubes) (except the groupby verb) can also be applied to data cube collections. In that case, they will simply be applied to each element of the collection seperately. Hence, the output will again be a data cube collection, with the same amount of members.

This allows to model well-know "split-apply-combine" processes, such as aggregation. You start with a single data cube, split it with the [groupby()](#Groupby) verb into a collection, apply one of the verbs for single data cubes to each of its members, and then combine them back together using one of the verbs for cube collections.

For example: we want to know the average water count over space for each year in our time dimension separately.

In [125]:
recipe = sq.QueryRecipe()

In [126]:
recipe["avg_count"] = sq.entity("water").\
    groupby_time("year").\
    reduce("space", "count").\
    reduce("time", "mean").\
    concatenate("year")

In [127]:
out = recipe.execute(factbase, ontology, space, time, **config)

In [128]:
out["avg_count"]

Another example: when we have a spatial extent consisting of multiple distinct spatial features, we might want to know the water count at each timestamp for each feature separately.

In [129]:
recipe = sq.QueryRecipe()

In [130]:
recipe["count_per_feat"] = sq.entity("water").\
    groupby_space("feature").\
    reduce("space", "count").\
    concatenate("feat")

In [131]:
space = sq.SpatialExtent(gpd.read_file("files/parcels.geojson"))
config = {"crs": 3035, "tz": "UTC", "spatial_resolution": [-100, 100]}

out = recipe.execute(factbase, ontology, space, time, **config)

In [132]:
out["count_per_feat"]