# Example Ordering

In this example notebook, we will compare the performance of the operations of the Example Operations notebook with and without spatial ordering.

For these operations, spatial ordering becomes beneficial when the chuck size is small relative to the data size. 
We will simulate this by choosing a chuck size that is a lot smaller than normally recommended.

We will first load the example STM, order it according to the Morton code of the pixel coordinates and store and reload it. 
This will provide a fair comparison of delayed operations.

We will then repeat the operations of the Example Operations notebook on both datasets and time them:

1. Locate the entries in an STM which intersect building polygons;
2. Add year of construction as an attribute to the STM.

Finally, we will visualize the change in the order of the elements and compare the processing times.

## Prepare the data

For setup and, see [Example Operations notebook](./demo_operations_stm.ipynb)

In [1]:
from pathlib import Path
import xarray as xr
import numpy as np
import shutil
import stmtools

In [2]:
# Load the example STM.
path_stm = Path('./stm.zarr')
path_stm_ordered = Path('./stm_ordered.zarr')

# Note that normally we would advise using a chuck size closer to 10000.
# This chunk size is chosen to demonstrate the potential advantages of spatial sorting for larger datasets.
#chunksize = 10000
chunksize = 500
stmat = xr.open_zarr(path_stm)
stmat = stmat.chunk({"space": chunksize, "time": -1})

In [3]:
# Load the data again to order it.
# Note that storing the data to zarr fails when the chunk size becomes too small.
stmat_ordered_tmp = xr.open_zarr(path_stm)
stmat_ordered_tmp = stmat_ordered_tmp.chunk({"space": 1000, "time": -1})

# Reorder the STM.
stmat_ordered_tmp = stmat_ordered_tmp.stm.reorder(xlabel="azimuth", ylabel="range")

  return self.array[key]


In [4]:
# Store and reload the ordered STM.
shutil.rmtree(path_stm_ordered)
stmat_ordered_tmp.to_zarr(path_stm_ordered)
stmat_ordered = xr.open_zarr(path_stm_ordered)
stmat_ordered = stmat_ordered.chunk({"space": chunksize, "time": -1})

In [5]:
print(stmat)

<xarray.Dataset> Size: 14MB
Dimensions:    (space: 78582, time: 10)
Coordinates:
    azimuth    (space) int64 629kB dask.array<chunksize=(500,), meta=np.ndarray>
    lat        (space) float32 314kB dask.array<chunksize=(500,), meta=np.ndarray>
    lon        (space) float32 314kB dask.array<chunksize=(500,), meta=np.ndarray>
    range      (space) int64 629kB dask.array<chunksize=(500,), meta=np.ndarray>
  * time       (time) int64 80B 0 1 2 3 4 5 6 7 8 9
Dimensions without coordinates: space
Data variables:
    amplitude  (space, time) float32 3MB dask.array<chunksize=(500, 10), meta=np.ndarray>
    complex    (space, time) complex64 6MB dask.array<chunksize=(500, 10), meta=np.ndarray>
    phase      (space, time) float32 3MB dask.array<chunksize=(500, 10), meta=np.ndarray>
Attributes:
    multi-look:  coarsen-mean


In [6]:
print(stmat_ordered)

<xarray.Dataset> Size: 15MB
Dimensions:    (space: 78582, time: 10)
Coordinates:
    azimuth    (space) int64 629kB dask.array<chunksize=(500,), meta=np.ndarray>
    lat        (space) float32 314kB dask.array<chunksize=(500,), meta=np.ndarray>
    lon        (space) float32 314kB dask.array<chunksize=(500,), meta=np.ndarray>
    range      (space) int64 629kB dask.array<chunksize=(500,), meta=np.ndarray>
  * time       (time) int64 80B 0 1 2 3 4 5 6 7 8 9
Dimensions without coordinates: space
Data variables:
    amplitude  (space, time) float32 3MB dask.array<chunksize=(500, 10), meta=np.ndarray>
    complex    (space, time) complex64 6MB dask.array<chunksize=(500, 10), meta=np.ndarray>
    order      (space) int64 629kB dask.array<chunksize=(500,), meta=np.ndarray>
    phase      (space, time) float32 3MB dask.array<chunksize=(500, 10), meta=np.ndarray>
Attributes:
    multi-look:  coarsen-mean


## Repeat example operations

These operations are applied to both the original STM and the ordered STM.

Note that this can take a few minutes.

In [7]:
path_polygon = Path('bag_light_AMS_WGS84.gpkg')
fields_to_query = ['bouwjaar']

In [None]:
# Example operations on original STM.
stmat_subset = stmat.stm.subset(method='polygon', polygon=path_polygon)
stmat_enriched = stmat_subset.stm.enrich_from_polygon(path_polygon, fields_to_query)
year_construction = stmat_enriched['bouwjaar'].compute()

  if dim not in self._obj.dims.keys():


In [None]:
# Example operations on ordered STM.
stmat_ordered_subset = stmat_ordered.stm.subset(method='polygon', polygon=path_polygon)
stmat_ordered_enriched = stmat_ordered_subset.stm.enrich_from_polygon(path_polygon, fields_to_query)
year_construction = stmat_ordered_enriched['bouwjaar'].compute()

In [None]:
print(stmat_subset)

In [None]:
print(stmat_ordered_subset)

### Visualize the results

The images below are colored by element index.

In [None]:
from matplotlib import pyplot as plt
import matplotlib.cm as cm

In [None]:
# Visualize original results.
fig, ax = plt.subplots()
plt.title("Element index, original")
plt.scatter(stmat_enriched.lon.data, stmat_enriched.lat.data, c=np.arange(len(stmat_enriched.lon)), s=0.004, cmap=cm.jet)
plt.colorbar()

In [None]:
# Visualize ordered results.
fig, ax = plt.subplots()
plt.title("Element index, ordered")
plt.scatter(stmat_ordered_enriched.lon.data, stmat_ordered_enriched.lat.data, c=np.arange(len(stmat_ordered_enriched.lon)), s=0.004, cmap=cm.jet)
plt.colorbar()

### Compare processing times

Note that these timing tests were not performed immediately, because the ```%timeit``` magic function does not allow variable assignment.

In [None]:
# Compute timings of ordering STM.
time_ordering = %timeit -o stmat.stm.reorder(xlabel="azimuth", ylabel="range")

In [None]:
# Compute timings of operations on original STM.
time_subset = %timeit -o stmat.stm.subset(method='polygon', polygon=path_polygon)
time_enrich = %timeit -o stmat_subset.stm.enrich_from_polygon(path_polygon, fields_to_query)
time_enrich_compute = %timeit -o stmat_enriched['bouwjaar'].compute()

In [None]:
# Compute timings of operations on ordered STM.
time_ordered_subset = %timeit -o stmat_ordered.stm.subset(method='polygon', polygon=path_polygon)
time_ordered_enrich = %timeit -o stmat_ordered_subset.stm.enrich_from_polygon(path_polygon, fields_to_query)
time_ordered_enrich_compute = %timeit -o stmat_ordered_enriched['bouwjaar'].compute()

In [None]:
print(f"Ordering:           {time_ordering}")

In [None]:
print(f"Subset (original):  {time_subset}")
print(f"Enrich (original):  {time_enrich}")
print(f"Compute (original): {time_enrich_compute}")

In [None]:
print(f"Subset (ordered):   {time_ordered_subset}")
print(f"Enrich (ordered):   {time_ordered_enrich}")
print(f"Compute (ordered):  {time_ordered_enrich_compute}")

In [None]:
print(f"Subset (diff):      {time_subset.average-time_ordered_subset.average}")
print(f"Enrich (diff):      {time_enrich.average-time_ordered_enrich.average}")
print(f"Compute (diff):     {time_enrich_compute.average-time_ordered_enrich_compute.average}")