In [1]:
import os
import xarray as xr
import xdggs
import zarr
os.environ['DGGRID_PATH']='/home/dick/micromamba/envs/xdggs/bin/dggrid'
from xdggs_dggrid4py.IGEO7 import IGEO7Index
import warnings
warnings.filterwarnings("ignore", category=RuntimeWarning)
warnings.filterwarnings("ignore")

## A simple step by step howto
This notebook showcases converting a 2D array dataset to a 1D dataset with DGGS (IGEO7) as an index. 
- prepare data
- conversion from 2D coordinates to DGGS cell id
- some use cases on DGGS

### Prepare Data

In [2]:
# data source : https://data.opendatascience.eu/geonetwork/srv/eng/catalog.search#/metadata/356923ff-88a1-4770-8bc7-3de7584079be
data = xr.open_dataset("https://s3.eu-central-1.wasabisys.com/eumap/aq/aq_pm25_et.eml_m_1km_na_201812_eumap_epsg3035_v0.1prebeta.tif", band_as_variable=True, engine='rasterio')
data

We need to specify the attributes of DGGS for conversion; the content varies from different DGGS.
The full attributes content of IGEO7 is :
```python
    {    "grid_name": "igeo7",  
             "level": -1,  # you can specify the required resolution here, or -1 to calculate the resoultion automically
          "src_epsg": "EPSG:3035", # the epsg of the data, you may need to change it.
           "method" : "nearestpoint", # centroid or nearestpoint 
       "coordinate" : ["x","y"], # what are the coordinate name in the xarray, noted that the order must be the same with the stack below 
               "mp" : 1, # for multiprocessing, number of process that you want to use
             "chunk" : (100,100), # block size, when given, the whole extent will be processed in smaller blocks, default to whole extent. 
    }
```
After that, we assign the attributes to one of the current coordinates, for this example, either x or y.

In [3]:
attrs={"grid_name": "igeo7",
        "level": -1,
        "src_epsg": "EPSG:3035",
        "method" : "nearestpoint",
        "coordinate" : ["x","y"],
        "chunk": (500,500),
        "mp": 6}
data['y'].attrs=attrs

### Conversion from 2D coordinates to DGGS cell id
To peform the conversion, we use the stack function from xarray to create a new Index with the class `IGEO7Index` provided.

Notes on performance: 
- with size of 3472 x 3875 pixels and resoultion 9
- Conversion time with whole extent , mp=1 is 8mins (6GB Ram)
- Conversion time with chunk (500x500), mp =6  is 2mins (2GB Ram)

In [4]:
%%time
dggs_data = data.stack(cell_ids=("x", "y"), index_cls=IGEO7Index)

c1 shape: ((3472,)), c2 shape: ((3857,))
Calculate Auto resolution
1561500.0,2542500.0,5417500.0,6013500.0
Total Bounds (EPSG:3035): [2542500. 1561500. 6013500. 5417500.]
Total Bounds (wgs84): [-33.13136656  35.06597435  51.54843604  67.09363937]
Total Bounds Area (km^2): 20793600.108343568
Area per center point (km^2): 1.5527456892327829
Auto resolution : 9, area: 1.2639902 km2
--- Multiprocessing 6 ---
---Generate Cell ID with resolution 9 by nearestpoint, number or job: 56, job size: 250000, chunk: (500, 500) ---


  0%|          | 0/56 [00:00<?, ?it/s]

cell generation time: (123.13359808921814)
Cell ID calcultion completed, unique cell id :10047670
CPU times: user 3.44 s, sys: 1.08 s, total: 4.51 s
Wall time: 2min 11s


After conversion, the dataset is transformed into one dimension, as shown below.
However, there are some drawbacks to use the stack method: 
- After conversion, the index can't be used immediately, as it becomes an multi-index `(cell_ids, x, y)`
- The attributes can't be carried to the newly created index

So we have to : 
1. assign back the attributes manually to cell_ids
2. save the dataset to disk and load it back to decomposit the multi-index

In [5]:
dggs_data

In [7]:
# Very important ! copy the attributes to cell_ids
dggs_data.cell_ids.attrs = dggs_data.xindexes.get('cell_ids')._grid.to_dict()
# save it to zarr
compressor = zarr.Blosc(cname="zstd", clevel=3, shuffle=2)
dggs_data.to_zarr(f'dggs_data.zar',encoding={"band_1": {"compressor": compressor},"cell_ids": {"compressor": compressor}})

<xarray.backends.zarr.ZarrStore at 0x7f01938d3d90>

After we load the dataset from the disk again, the index becomes a single index, but it is loaded as PandasIndex.
We can use the `xdggs.decode` function to re-initialize it to the IGEO7 index.

In [8]:
dggs_zarr = xr.open_zarr('./dggs_data.zar')
dggs_zarr 

Unnamed: 0,Array,Chunk
Bytes,102.17 MiB,817.36 kiB
Shape,"(13391504,)","(104622,)"
Dask graph,128 chunks in 2 graph layers,128 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 102.17 MiB 817.36 kiB Shape (13391504,) (104622,) Dask graph 128 chunks in 2 graph layers Data type float64 numpy.ndarray",13391504  1,

Unnamed: 0,Array,Chunk
Bytes,102.17 MiB,817.36 kiB
Shape,"(13391504,)","(104622,)"
Dask graph,128 chunks in 2 graph layers,128 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,102.17 MiB,817.36 kiB
Shape,"(13391504,)","(104622,)"
Dask graph,128 chunks in 2 graph layers,128 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 102.17 MiB 817.36 kiB Shape (13391504,) (104622,) Dask graph 128 chunks in 2 graph layers Data type float64 numpy.ndarray",13391504  1,

Unnamed: 0,Array,Chunk
Bytes,102.17 MiB,817.36 kiB
Shape,"(13391504,)","(104622,)"
Dask graph,128 chunks in 2 graph layers,128 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,51.08 MiB,817.36 kiB
Shape,"(13391504,)","(209243,)"
Dask graph,64 chunks in 2 graph layers,64 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 51.08 MiB 817.36 kiB Shape (13391504,) (209243,) Dask graph 64 chunks in 2 graph layers Data type float32 numpy.ndarray",13391504  1,

Unnamed: 0,Array,Chunk
Bytes,51.08 MiB,817.36 kiB
Shape,"(13391504,)","(209243,)"
Dask graph,64 chunks in 2 graph layers,64 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray


In [9]:
dggs_zarr = xdggs.decode(dggs_zarr) 
dggs_zarr

Unnamed: 0,Array,Chunk
Bytes,102.17 MiB,817.36 kiB
Shape,"(13391504,)","(104622,)"
Dask graph,128 chunks in 2 graph layers,128 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 102.17 MiB 817.36 kiB Shape (13391504,) (104622,) Dask graph 128 chunks in 2 graph layers Data type float64 numpy.ndarray",13391504  1,

Unnamed: 0,Array,Chunk
Bytes,102.17 MiB,817.36 kiB
Shape,"(13391504,)","(104622,)"
Dask graph,128 chunks in 2 graph layers,128 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,102.17 MiB,817.36 kiB
Shape,"(13391504,)","(104622,)"
Dask graph,128 chunks in 2 graph layers,128 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 102.17 MiB 817.36 kiB Shape (13391504,) (104622,) Dask graph 128 chunks in 2 graph layers Data type float64 numpy.ndarray",13391504  1,

Unnamed: 0,Array,Chunk
Bytes,102.17 MiB,817.36 kiB
Shape,"(13391504,)","(104622,)"
Dask graph,128 chunks in 2 graph layers,128 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,51.08 MiB,817.36 kiB
Shape,"(13391504,)","(209243,)"
Dask graph,64 chunks in 2 graph layers,64 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 51.08 MiB 817.36 kiB Shape (13391504,) (209243,) Dask graph 64 chunks in 2 graph layers Data type float32 numpy.ndarray",13391504  1,

Unnamed: 0,Array,Chunk
Bytes,51.08 MiB,817.36 kiB
Shape,"(13391504,)","(209243,)"
Dask graph,64 chunks in 2 graph layers,64 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray


In [10]:
# We don't need the x, y cooridnates 
dggs_zarr = dggs_zarr.drop_vars(['x','y'])

In [11]:
# some basic operation on the xarray with xdggs 
dggs_zarr.dggs.sel_latlon(latitude=[44.56375059,44.56369803],longitude=[6.68935115])

Unnamed: 0,Array,Chunk
Bytes,4 B,4 B
Shape,"(1,)","(1,)"
Dask graph,1 chunks in 3 graph layers,1 chunks in 3 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 4 B 4 B Shape (1,) (1,) Dask graph 1 chunks in 3 graph layers Data type float32 numpy.ndarray",1  1,

Unnamed: 0,Array,Chunk
Bytes,4 B,4 B
Shape,"(1,)","(1,)"
Dask graph,1 chunks in 3 graph layers,1 chunks in 3 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray


In [12]:
# generate cell's centroids for selected cellids , if not given, generate all.
# return a data_array
dggs_zarr.dggs.cell_centers()

  0%|          | 0/26784 [00:00<?, ?it/s]

In [13]:
# generate cell's polygon for selected cellids , if not given, generate all.
# return a data_array
dggs_zarr.dggs.cell_boundaries()

  0%|          | 0/26784 [00:00<?, ?it/s]