The Basics
==========

Viewing geospatial data isn't difficult: there are a number of graphical user
interfaces (GUIs) available that allow you you to inspect your data. Modifying
and transforming the data, on the other hand, isn't so easy: changing single cells
is too laborious for all but the most trivial example, and anything more
complicated than a single addition or multiplication seems rather intricate.

Fortunately, the Python package``xarray`` provides a very versatile toolbox to do 
exactly this. Let's start by importing ``imod`` package and setting up some data.

In [None]:
import imod
# Shut up some annoying warnings as well
import warnings
warnings.simplefilter(action="ignore", category=RuntimeWarning)
# Unzip the data directory
import zipfile
with zipfile.ZipFile("data.zip", "r") as f:
    f.extractall("data")

We have prepared two example files:

 1. A digital elevation map (DEM) in geotiff (.tif) format.
 2. A groundwater model head result in .idf format.

We can load both of these into Python using an ``imod`` function.

In [None]:
dem = imod.rasterio.open("data/dem.tif")
head = imod.idf.open("data/head.idf")

Let's have a look at how these files are represented within Python by xarray:

In [None]:
print(dem)
print("\n")
print(head)

Both files have been loaded into an xarray object called a DataArray.
In overview, a DataArray consists of:

1. A (numpy) array with data.
2. Coordinates, describing in this case the geographical location of the data.

By printing the DataArrays above, we can see that the data has two dimensions,
which are described by the coordinates ``"y"`` and ``"x"``, as we would expect
from geospatial data.

Let's start with a simple operation: subtracting ``head`` from ``dem`` to
create a DataArray that shows us the depth of the groundwater table.

In [None]:
depth = dem - head

What does it look like? Xarray provides built-in plotting functions to directly
create (``matplotlib``) plots.

In [None]:
depth.plot()

Another typical operation is conditional evaluation. Let's classify all the areas
with relatively deep groundwater levels (deep in Dutch terms, that is).

In [None]:
is_deep = depth > 5.0
print(is_deep)

Observe that ``is_deep`` has as its datatype ``bool``.  In Python
(and many other languages), boolean values are a subclass of integers. A value
of ``True`` also equals 1, and a value of ``False`` also equals 0.

Plotting the result also shows ones and zeros:

In [None]:
is_deep.plot()

An essential feature of xarray is that it represents nodata values by
Not-A-Number (``nan``). ``nan`` values have specific behaviour that sets them
apart from other floating point values:

In [None]:
import numpy as np

print(1.0 > np.nan)
print(1.0 <= np.nan)

``nan`` is neither larger nor smaller than 1.0. After all, it's not a number.
This has some ramifications for boolean selection. It means that the result of
a ``>`` operation is not the exact inverse of a ``<=`` operation!

We'll import ``matplotlib`` to show an example, side-by-side.

In [None]:
import matplotlib.pyplot as plt

is_shallow = depth <= 5.0
fig, (ax1, ax2) = plt.subplots(nrows=1, ncols=2, figsize=(9, 4))
is_shallow.plot(ax=ax1)
is_deep.plot(ax=ax2)

To get the exact inverse, we can use the inverse operator: ``~``.
For boolean, inversion equals logical not.

In [None]:
fig, (ax1, ax2) = plt.subplots(nrows=1, ncols=2, figsize=(9, 4))
is_shallow.plot(ax=ax1)
(~is_shallow).plot(ax=ax2)

There are three more operators to be aware of:
1. and: ``&``
2. or: ``|``
3. exclusive or (xor): ``^``

Now, using the boolean results, let's select only the shallow groundwater depths.

In [None]:
shallow_depth = depth.where(cond=is_shallow)
shallow_depth.plot()

Great, we've selected only the shallow parts. What if we want to replace the
values, based on some condition?  ``.where()`` has another keyword: ``other``.
This value is used where the condition is ``False``. 

Replacing all values deeper than 10 is done as follows:

In [None]:
modified_depth = depth.where(cond=depth < 10.0, other=10.0)
modified_depth.plot()

Note: the default value of ``other`` is ``nan``. If you don't provide a value
for ``other``, you're not really "selecting" the data, you're actually marking
part of it as nodata!

Note that the nodata parts also have been filled with value of ``10.0``.
The reason is as mentioned above: ``nan < 10.0`` evaluates to ``False``.

In such a case, it's most straightforward to explicitly preserve nodata values:

In [None]:
modified_depth = depth.where(cond=depth < 10.0, other=10.0).where(depth.notnull())
modified_depth.plot()

We save our ``modified_depth`` to both ``.tif`` and ``.idf`` format for
safekeeping:

In [None]:
imod.idf.write("modified_depth.idf", modified_depth)
imod.rasterio.write("modified_depth.tif", modified_depth)

Summarizing, we've seen:

* Reading geospatial data into DataArrays
* Simple arithmetic
* Plotting
* Conditional filtering
* Writing DataArrays to disk

Those are the basics. Continue with the next tutorial to work on some more
interesting examples.