Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A reordering function for stmtools #34

Closed
rogerkuou opened this issue Aug 10, 2023 · 9 comments
Closed

A reordering function for stmtools #34

rogerkuou opened this issue Aug 10, 2023 · 9 comments
Assignees

Comments

@rogerkuou
Copy link
Member

rogerkuou commented Aug 10, 2023

We would like to have a reordering function for stmtools, to make the spatially close-by points also close in the points orders. This will benefit the enrichment function.

requirements:

  • It only loads 2D coordinates needed for reordering.
  • The reordering only needs to be applied on the point dimension.
  • The other data variables and coordinates should remain delayed.

Example application:

import xarray as xr
stmat = xr.open_zarr('stm.zarr')

# Reorder stmat
stmat_reorder = stmat.stm.reorder(xlabel='X', ylabel='Y')

Example dataset can be retrieved from here

@rogerkuou
Copy link
Member Author

Example notebook has been made by Thijs. The next step is to implement the Morton order as a function

@vanlankveldthijs
Copy link
Contributor

vanlankveldthijs commented Oct 19, 2023

So, a function for reordering should be made part of the stm extension to xarray (in stmtools.git: stmtools/stm.py).

Ideally, only evaluate the point coordinates to reduce the strain on memory (delayed processing).

@vanlankveldthijs
Copy link
Contributor

It could be that any reordering operation on an xarray will have to evaluate all the point attributes. In this case, we may have to also implement some sort of redirection array (with only x, y, and index in the original array).

@vanlankveldthijs
Copy link
Contributor

I looked at a few light-weight Morton ordering python tools.

A very generic and simple one is trevorprater/pymorton. This one has two disadvantages though:

  1. if you want to order lat-lon pairs, the output is a base-4 string.
  2. More importantly, you cannot specify the precision. This is always set to 32 or 64 bits depending on your system.

There are several geohashing python tools. The one that is currently most popular is https://pypi.org/project/python-geohash/
This package is less generic in that it expects lat-lon pairs (which is fine for out purpose), it does allow setting the precision, and it outputs the hash in base-32. Additionally, the functional part is implemented in C++ as opposed to pyhton.
Unfortunately, it is poorly documented, but this need not be a problem, because of its limited scope and straigtforward functionality.

I will have to check whether computation time could be a limiting factor for either tool.

@rogerkuou
Copy link
Member Author

Once the ordering hash/index is computed, the sorting can be done by the sortby function.

In case the single column of the ordering hash/index is too big to persist in the memory, we can first write the ordering index using the older chunks to disk, then reload the whole dataset lazily, finally sort by the lazy index.

@vanlankveldthijs
Copy link
Contributor

We also decided to (initially) sort by image (pixel) coordinate.
This has the advantage of being a local coordinate system (less precision needed to fully specify each point)
and allowing producing an integer hash more intuitively.

@vanlankveldthijs
Copy link
Contributor

Also, we briefly discussed the timing of the sorting procedure.

Ideally this should be done immediately after pixel selection to prevent writing data chunks that will have to be overwritten after sorting.

However, we also need to be able to work with pre-existing data that is already chunked.

Maybe this means there should be two sorting procedures, or at least two ways of commencing the sorting.

@vanlankveldthijs
Copy link
Contributor

vanlankveldthijs commented Oct 26, 2023

Example delayed funtion: stm/py:enrich_from_polygon -> xr.map_blocks(...)

Better yet: sarxarray/stack.py:_get_phase(...) -> da.apply_gufunc(...)
Note, apply_gufunc expects the name of the function, the 'signature', the list of function arguments and then the meta dtype for the output of the function.

@rogerkuou
Copy link
Member Author

Function added via #56. Documentation need to be added (#57).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants