# Annual Sentinel-2 Geomedian run with odc-stats

Useful links:
* [odc-stats](https://github.com/opendatacube/odc-stats)
* [crop-mask plugin](https://github.com/digitalearthafrica/crop-mask/blob/main/production/cm_tools/cm_tools/gm_ml_pred.py)
* [odc-algo geomedians](https://github.com/opendatacube/odc-algo/blob/main/odc/algo/_geomedian.py#L337)
* [example geomedian config files](https://github.com/GeoscienceAustralia/dea-config/tree/09fa937a9c79e3505e85d2364a30bc002ca0c5f3/dev/services/odc-stats/geomedian)

In [11]:
# !pip uninstall s2_gm_tools -y
!pip install s2_gm_tools/

Processing ./s2_gm_tools
  Preparing metadata (setup.py) ... [?25ldone
[?25hBuilding wheels for collected packages: s2-gm-tools
  Building wheel for s2-gm-tools (setup.py) ... [?25ldone
[?25h  Created wheel for s2-gm-tools: filename=s2_gm_tools-1.0.0-py3-none-any.whl size=4609 sha256=89a382540d394e42e075c5c499e834f9ab6f7b22c9089497ac7239aaa32d2e8b
  Stored in directory: /tmp/pip-ephem-wheel-cache-4cenrhb2/wheels/4e/ba/0e/f78ae9c2f6d8472eaed9090aa95acc9d2252f1e66a22e9b339
Successfully built s2-gm-tools
Installing collected packages: s2-gm-tools
Successfully installed s2-gm-tools-1.0.0


In [1]:
import os
import json
import warnings
import xarray as xr
import rioxarray as rxr
import geopandas as gpd
import matplotlib.pyplot as plt
from odc.geo.xr import assign_crs
from odc.stats.tasks import TaskReader
from odc.stats.model import OutputProduct

warnings.filterwarnings("ignore")

## Analysis Parameters

Some tile ids to run
* 'x43y14' # se aus forests Alps.
* 'x39y09' # West tassie
* 'x33y26' # Central Aus with salt lakes
* 'x31y43' # Tropical NT
* 'x19y18' # Esperance crops and sand dunes
* 'x42y38' # Qld tropical forests
* 'x39y13' # Melbourne city and bay+crops
* 'x12y19' # Perth City
* 'x41y12' # Complex coastal in Vic.

In [None]:
# tiles = ['x30y34','x36y52','x61y30','x58y22','x57y28', 'x61y29', 'x64y32', 'x65y40', 'x60y53' ,'x55y51', 'x46y58', 'x46y46', 'x36y34']
# gdf = gpd.read_file('~/gdata1/projects/s2_gm/testing_tile_suite.geojson')

# gdf = gdf[gdf['region_code'].isin(tiles)]
# gdf.reset_index(drop=True).to_file('~/gdata1/projects/s2_gm/testing_tile_suite_13tiles.geojson')

In [2]:
year='2022'
t = 39,9  # tile id to run i.e. x19y18
resolution = 100 # can coarsen resolution to run to speed up testing
products='ga_s2am_ard_3-ga_s2bm_ard_3-ga_s2cm_ard_3' # use all S2 observations
name, version = 'ga_s2_gm_cyear_3', '0-0-1' #product name and version
results = '/gdata1/projects/s2_gm/results/' #where are we outputting results?
ncpus=15
mem='100Gi'

## Save tasks database etc.

In [3]:
os.system("odc-stats save-tasks "\
          "--grid au-10 "\
          f"--year {year} "\
          f"--input-products {products}"
         )

# !odc-stats save-tasks --grid au-10 --year 2020 --input-products ga_s2am_ard_3-ga_s2bm_ard_3-ga_s2cm_ard_3

config from yaml {} None
[2025-05-20 22:58:16,854] {_cli_save_tasks.py:176} INFO - Config overrides: {'grid': 'au-10', 'input_products': 'ga_s2am_ard_3-ga_s2bm_ard_3-ga_s2cm_ard_3'}
[2025-05-20 22:58:16,854] {_cli_save_tasks.py:179} INFO - Using config: {'grid': 'au-10', 'input_products': 'ga_s2am_ard_3-ga_s2bm_ard_3-ga_s2cm_ard_3', 'complevel': 6}


  t = Period(begin, freq)


Connecting to the database, streaming datasets
Training compression dictionary
.. done
Count: 136,853
       661.3 per second
Total: 206.957 sec
TTFB :  0.045 sec
.....: 643CFD16C64845ADA7D5FB9619E2076D
..
Total of 1,512 spatial tiles
Total of 136,320 unique dataset IDs after filtering
Saving tasks to disk (1512)
.. done
Writing summary to ga_s2am_ard_3-ga_s2bm_ard_3-ga_s2cm_ard_3_2022--P1Y.csv
Dumping GeoJSON(s)
..writing to ga_s2am_ard_3-ga_s2bm_ard_3-ga_s2cm_ard_3_2022--P1Y-2022--P1Y.geojson


0

## Find the tile ID to run

We'll pass this index to odc-stats next to tell it to run this tile

In [4]:
## Open the task database to find out tiles
op = OutputProduct(
            name=name,
            version=version,
            short_name=name,
            location=f"s3://dummy-bucket/{name}/{version}",
            properties={"odc:file_format": "GeoTIFF"},
            measurements=['nbart_red'],
        )

taskdb = TaskReader(f'{products}_{year}--P1Y.db', product=op)
task = taskdb.load_task((f'{year}--P1Y', t[0], t[1]))

# Now find index of the tile we want to run
# We'll pass this index to odc-stats next to tell it to run this tile
tile_index_to_run = []
all_tiles = list(taskdb.all_tiles)
for i, index in zip(all_tiles, range(0, len(all_tiles))):
    if (i[1]==t[0]) & (i[2]==t[1]):
        tile_index_to_run.append(index)
        print(index)

990


### Optionally view tile to check location

The next cell will plot the tile extent on an interactive map so you can ensure its the tile you want to run.

In [5]:
# with open('task_tile_check.geojson', 'w') as fh:
#     json.dump(task.geobox.extent.to_crs('epsg:4326').json, fh, indent=2)

gdf = gpd.GeoDataFrame(index=[0], crs='epsg:4326', geometry=[task.geobox.extent.to_crs('epsg:4326').geom])
gdf.explore()

## Run the geomedian algo using odc-stats

Put this link into the dask dashboard to view the progress, altering the email address to yours: https://app.sandbox.dea.ga.gov.au/user/chad.burton@ga.gov.au/proxy/8787/status

In [45]:
%%time
os.system("odc-stats run "\
          f"{products}_{year}--P1Y.db "\
          "--config=s2_gm_tools/s2_gm_tools/config/config_gm_s2_annual_s2Cloudless_enhanced.yaml "\
          f"--resolution={resolution} "\
          f"--threads={ncpus} "\
          f"--memory-limit={mem} "\
          f"--location=file:///home/jovyan/{results}{name}/{version}" f"{tile_index_to_run[0]}"
         )

[2025-05-21 00:13:26,600] {_cli_run.py:168} INFO - Config overrides: {'filedb': 'ga_s2am_ard_3-ga_s2bm_ard_3-ga_s2cm_ard_3_2022--P1Y.db', 'threads': 15, 'memory_limit': '100Gi', 'output_location': 'file:///home/jovyan//gdata1/projects/s2_gm/results/ga_s2_gm_cyear_3/0-0-1990'}
[2025-05-21 00:13:26,600] {_cli_run.py:200} INFO - Using this config: TaskRunnerConfig(filedb='ga_s2am_ard_3-ga_s2bm_ard_3-ga_s2cm_ard_3_2022--P1Y.db', aws_unsigned=True, plugin='s2_gm_tools.s2_gm_plugin.GMS2AUS', plugin_config={'resampling': 'cubic', 'bands': ['nbart_red', 'nbart_green', 'nbart_blue'], 'rgb_bands': ['nbart_red', 'nbart_green', 'nbart_blue'], 'mask_band': 'oa_s2cloudless_mask', 'proba_band': 'oa_s2cloudless_prob', 'contiguity_band': 'nbart_contiguity', 'nodata_classes': ['nodata'], 'cp_threshold': 0.1, 'cloud_filters': {'cloud': [['opening', 2], ['dilation', 3]]}, 'aux_names': {'smad': 'sdev', 'emad': 'edev', 'bcmad': 'bcdev', 'count': 'count'}}, product={'name': 'ga_s2_gm_cyear_3', 'short_name': 

Traceback (most recent call last):
  File "/env/bin/odc-stats", line 8, in <module>
    sys.exit(main())
  File "/env/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/env/lib/python3.10/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/env/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/env/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/env/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/env/lib/python3.10/site-packages/odc/stats/_cli_run.py", line 233, in run
    for result in result_stream:
  File "/env/lib/python3.10/site-packages/odc/stats/proc.py", line 213, in _run
    client = self.client()
  File "/env/lib/python3.10/site-packages/odc/stats/proc.py", li

CPU times: user 4.68 ms, sys: 610 μs, total: 5.29 ms
Wall time: 3.72 s


256

In [46]:
!pip list | grep odc

odc-algo                          0.2.3
odc-cloud                         0.2.5
odc-dscache                       0.2.3
odc-geo                           0.4.8
odc-io                            0.2.2
odc-stac                          0.3.10
odc-stats                         1.0.57
odc-ui                            0.2.1


In [44]:
# !pip uninstall s2_gm_tools -y
!pip install s2_gm_tools/

Processing ./s2_gm_tools
  Preparing metadata (setup.py) ... [?25ldone
[?25hBuilding wheels for collected packages: s2-gm-tools
  Building wheel for s2-gm-tools (setup.py) ... [?25ldone
[?25h  Created wheel for s2-gm-tools: filename=s2_gm_tools-1.0.0-py3-none-any.whl size=4609 sha256=d8c6dd1125c4760d98266b05a9cec42698041f57f10a134bb38f6c0070a32979
  Stored in directory: /tmp/pip-ephem-wheel-cache-oly0d1xf/wheels/4e/ba/0e/f78ae9c2f6d8472eaed9090aa95acc9d2252f1e66a22e9b339
Successfully built s2-gm-tools
Installing collected packages: s2-gm-tools
Successfully installed s2-gm-tools-1.0.0


## Plot the RGBA output

In [None]:
t = 39,9  # tile id
name, version = 'ga_s2_gm_cyear_3', '0-0-1'
results = '/gdata1/projects/s2_gm/results/'

In [None]:
x= f'x{t[0]}'
y= f'y{t[1]}'

path = f'{results}{name}/{version}/{x}/{y}/{year}--P1Y/{name}_{x}{y}_{year}--P1Y_final_rgba.tif'
rgba=rxr.open_rasterio(path)
rgba=assign_crs(rgba, crs='EPSG:3577')

rgba.plot.imshow(size=6);
plt.title(a+b);

## Interactively explore results

In [None]:
red_path = f'{results}{name}/{version}/{x}/{y}/{year}--P1Y/{name}{x}{y}_{year}--P1Y_final_nbart_red.tif'
green_path = f'{results}{name}/{version}/{x}/{y}/{year}--P1Y/{name}{x}{y}_{year}--P1Y_final_nbart_green.tif'
blue_path = f'{results}{name}/{version}/{x}/{y}/{year}--P1Y/{name}{x}{y}_{year}--P1Y_final_nbart_blue.tif'

r=assign_crs(rxr.open_rasterio(red_path).squeeze().drop_vars('band'),crs='EPSG:3577')
g=assign_crs(rxr.open_rasterio(green_path).squeeze().drop_vars('band'),crs='EPSG:3577')
b=assign_crs(rxr.open_rasterio(blue_path).squeeze().drop_vars('band'),crs='EPSG:3577')

r = r.rename('nbart_red')
g = g.rename('nbart_green')
b = b.rename('nbart_blue')

ds = assign_crs(xr.merge([r,g,b]), crs='EPSG:3577')

In [None]:
ds.odc.explore()

## Remove all files

In [None]:
# !rm -r -f results/ga_s2_gm_cyear_3/