Datacube #27

lillythomas · 2023-11-10T07:11:07Z

This script collects overlapping Sentinel-1, Sentinel-2 and DEM for an AOI and time range, aligns to a common extent and merges the data arrays.

To do:

fix the search_sentinel1_calc_max_area() function
choose the time to reset the merged dataarray time attribute to (right now it holds all of the respective times of its children arrays in the time dimension)
add the tiler
filter cloudiness on tile level
filter out tiles with high percent of nodata for any specific channel

srmsoumya

Looks super neat, the datacube pipeline is coming in well.

srmsoumya · 2023-11-10T14:04:46Z

datacube.py

+            "args": [
+                {
+                    "op": "s_intersects",
+                    "args": [{"property": "geometry"}, geom_CENTROID.__geo_interface__],


CENTROID is a point object, may be we can use that directly instead of geom_CENTROID

Nice! Yes we can. Implemented.

srmsoumya · 2023-11-10T14:06:24Z

datacube.py

+    """
+    da_sen2: xr.DataArray = stackstac.stack(
+        items=s2_items[0],
+        epsg=26910,  # UTM Zone 10N


Can we read the projection from the s2 item collection?

I assigned this to the same projection used for the S2 item.

srmsoumya · 2023-11-10T14:09:18Z

datacube.py

+    da_sen1.attrs["spec"] = str(da_sen1.spec)
+
+    # To fix ValueError: unable to infer dtype on variable None
+    for key, val in da_sen1.coords.items():
+        if val.dtype == "object":
+            print("Deleting", key)
+            da_sen1 = da_sen1.drop_vars(names=key)
+
+    # Create xarray.Dataset datacube with VH and VV channels from SAR
+    da_vh: xr.DataArray = da_sen1.sel(band="vh", drop=True).rename("vh")
+    da_vv: xr.DataArray = da_sen1.sel(band="vv", drop=True).rename("vv")
+    ds_sen1: xr.Dataset = xr.merge(objects=[da_vh, da_vv], join="override")


da_sen1 = stackstac.mosaic(da_sen1, dim="time") will create a composite of S1 tiles overlapping the S2 bbox. We can use that directly in the merge, and don't have to worry about pixel picking stratergy.

Word. I like this. Thanks.

So stackstac.mosaic returns an xarray.DataArray, whereas xarray.merge returns an xarray.Dataset. The main difference is that xarray.Dataset allows us to clearly show named data variables (e.g. VV, VH, B1, B2, etc), whereas xarray.DataArray stores these in the 'band' dimension. Do we have a preference for either format?

Having named data vars helps - stackstac.mosaic takes care of merging the pixels along the given dim (time in our case) which we don't have to handle at our end.
What if we create the data arrays & then concat them together as a dataset with vars attached?

We still retain the named data vars I think. Merged data array using mosaic method for Sentinel 1:

Merged datarray: <xarray.Dataset> Dimensions: (time: 1, band: 14, x: 11365, y: 11363) Coordinates: (12/67) * time (time) datetime64[ns] 2018-06... * band (band) <U4 'B02' 'B03' ... 'vv' * x (x) float64 2.973e+05 ... 4.1... * y (y) float64 4.202e+06 ... 4.0... id (time) <U54 'S2B_MSIL2A_20180... s2:generation_time <U24 '2020-10-12T02:49:45.926Z' ... ... sar:instrument_mode <U2 'IW' sar:product_type <U3 'GRD' s1:product_timeliness <U8 'Fast-24h' s1:resolution <U4 'high' description (band) object nan ... 'Terrai... proj:shape object {3600} Data variables: stackstac-95bc7f7975c3ca621cbffb094f33316e (time, band, y, x) float32 dask.array<chunksize=(1, 1, 1024, 1024), meta=np.ndarray> stackstac-80df347b660c0b771e7ba977f16777bc (band, y, x) float32 dask.array<chunksize=(13, 1024, 1024), meta=np.ndarray> stackstac-602439b1960e897b8044c8056eb3ce96 (band, y, x) float32 dask.array<chunksize=(14, 1024, 1024), meta=np.ndarray> Attributes: spec: RasterSpec(epsg=32611, bounds=(297340, 4088320, 410990, 4201... crs: epsg:32611 transform: | 10.00, 0.00, 297340.00|\n| 0.00,-10.00, 4201950.00|\n| 0.0... resolution: 10

datacube.py

srmsoumya · 2023-11-10T14:24:54Z

datacube.py

+    # TODO: Find a way to get minimal number of S1 tiles to cover the entire S2 bbox
+    s1_gdf["overlap"] = s1_gdf.intersection(box(*BBOX)).area
+    s1_gdf.sort_values(by="overlap", inplace=True)


Pick tiles from ascending or descending capture based on their overlap with the S2 bbox.

state = (s1_gdf[["sat:orbit_state", "overlap"]] .groupby(["sat:orbit_state"]) .sum() .sort_values(by="overlap", ascending=False) .index[0])

state = ascending OR descending.

Filter the s1_gdf by state: s1_gdf = s1_gdf[s1_gdf["sat:orbit_state"] == state]

Implemented

weiji14 · 2023-11-13T02:35:27Z

datacube.py

Do we want to have this file in the top-level directory, or in another folder?

No I think we can have this in a more appropriate subdir. I just put it here as we haven't decided on the structure yet. I'm happy to make the call and we can modify as we see fit.

Maybe put it in a subdir, will leave you up to you on the naming 🙂

Actually, I think @yellowcap is putting things under scripts/ in #29, so we could follow that?

datacube.py

weiji14 · 2023-11-13T02:50:19Z

datacube.py

+    return da_merge
+
+
+def process(year1, year2, aoi, resolution, epsg):


Not sure about needing to pass in an EPSG code here. For wide areas (in the West-to-East direction) that can span multiple UTM zones, wouldn't this force a single EPSG instead of multiple?

We are trying to overlap data from S1 & DEMs over S2 tiles. As S2 tiles are following the MGRS format, each tile will fall under a single UTM zone (this is my understanding, correct me if I am wrong).
S1 & DEMs might span across multiple UTM zones, but stackstac should reproject all of them together.

That makes sense to me. I guess I should switch this to just read the projection based on the MGRS tile's UTM zone instead of have a user argument for it.

Reading the EPSG code directly from the Sentinel-2 item properties now.

weiji14 · 2023-11-13T02:54:06Z

datacube.py

+        s2_items, s1_items, dem_items, BBOX, resolution, epsg
+    )
+
+    da_merge = merge_datarrays(da_sen2, da_sen1, da_dem)


As a sanity check, could you plot each invidual band from this datacube and see if they look ok? I've had issues with striped outputs when running xr.merge before on Sentinel-1 and Copernicus DEM, and just want to double check.

Should be addressed in #27 (comment)

…d-free conditions and orbit state

…rs, mosaic s1 on time

lillythomas · 2023-11-14T22:40:35Z

Ok. As of now, we are able to confidently generate a datacube w S2, S1 and DEM bands concatenated. The time dimension is that of the "best" S2 image in terms of cloud coverage. The S1 images are all of the same orbit state. Some validation results:

All bands:

Time var:

Some subset plots from each sensor:

@weiji14 @srmsoumya if looks good to you both, can merge and move on to the tiling part.

weiji14

Thanks Lilly, I think we should try to merge this by today. Just a few minor suggestions, and we can improve on stuff in follow-up PRs.

datacube.py

for more information, see https://pre-commit.ci

weiji14

Linter is complaining a bit, but we can fix those later since the script will need some refining. Thanks again @lillythomas, and also @srmsoumya for reviewing!

lillythomas added 7 commits November 9, 2023 01:03

combining data arrays into multi-sensor data cube

484caf6

implement cql2-json filters for S2 and S1

6eba841

script to generate merged datacube

047e955

wip function for calculating sentinel 1 scene with max coverage of bbox

7ccc23d

formatting

f4d769c

move script and remove notebook

2662c6f

use geom instead of geodataframe for initial aoi

bd5a718

lillythomas requested review from yellowcap, srmsoumya and weiji14 November 10, 2023 07:20

yellowcap mentioned this pull request Nov 10, 2023

Add first version of tile retriever. #11

Closed

srmsoumya reviewed Nov 10, 2023

View reviewed changes

lillythomas added 2 commits November 10, 2023 10:09

use args in funcs

803c7e7

use CENTROID for geom in cql2 query

2ccda53

lillythomas force-pushed the datacube_fusion branch from 8264159 to 2ccda53 Compare November 10, 2023 18:14

Use mosaic method, set singular time dimension based on Sentinel 2

87f61f0

lillythomas force-pushed the datacube_fusion branch from d38efac to 87f61f0 Compare November 10, 2023 19:29

weiji14 reviewed Nov 13, 2023

View reviewed changes

weiji14 mentioned this pull request Nov 13, 2023

WIP: Script to download Sentinel-1 RTC datacubes to Zarr stores #17

Closed

3 tasks

add configurable args for cloud cover percentage and nodata percentage

fd23e2c

lillythomas force-pushed the datacube_fusion branch from 917da9b to fd23e2c Compare November 14, 2023 04:16

use epsg code derived from Sentinel-2 properties, filter by best clou…

89aeef1

…d-free conditions and orbit state

lillythomas force-pushed the datacube_fusion branch from 0aa9fab to 89aeef1 Compare November 14, 2023 07:50

remove extra filter

81d1b02

lillythomas force-pushed the datacube_fusion branch from adf0146 to 81d1b02 Compare November 14, 2023 08:55

map s2 for best image using datetime to id, set s2 bands as unique va…

0c66e0f

…rs, mosaic s1 on time

lillythomas force-pushed the datacube_fusion branch from 2c34852 to 0c66e0f Compare November 14, 2023 22:04

assign S2 time as dataset variable

b3b5581

lillythomas force-pushed the datacube_fusion branch from 7dd58be to b3b5581 Compare November 14, 2023 22:31

weiji14 reviewed Nov 14, 2023

View reviewed changes

datacube.py Outdated Show resolved Hide resolved

datacube.py Outdated Show resolved Hide resolved

weiji14 mentioned this pull request Nov 14, 2023

Add geopandas-base #34

Merged

remove orbit filter

19d72a7

lillythomas force-pushed the datacube_fusion branch from d8a2831 to 19d72a7 Compare November 14, 2023 23:45

wrap example in main

2536b8b

lillythomas force-pushed the datacube_fusion branch from 440e13e to 2536b8b Compare November 14, 2023 23:55

move script to subdir

7a902e5

lillythomas force-pushed the datacube_fusion branch from cbad5b9 to 7a902e5 Compare November 15, 2023 00:02

use cloud variable

74ccc40

lillythomas force-pushed the datacube_fusion branch from 7e3de6f to 74ccc40 Compare November 15, 2023 00:07

lillythomas and others added 2 commits November 14, 2023 16:08

use cloud variable

c3fc516

[pre-commit.ci] auto fixes from pre-commit.com hooks

a14377f

for more information, see https://pre-commit.ci

weiji14 approved these changes Nov 15, 2023

View reviewed changes

lillythomas merged commit 3608d6d into main Nov 15, 2023
1 of 2 checks passed

lillythomas deleted the datacube_fusion branch November 15, 2023 00:15

weiji14 mentioned this pull request Nov 15, 2023

Fix lint errors in datacube script #36

Merged

This was linked to issues Nov 16, 2023

Sentinel 2 input spec and retrieval #18

Closed

Sentinel 1 input spec and retrieval #19

Closed

DEM specs and retrieval #20

Closed

This was referenced Nov 16, 2023

Sentinel 2 input spec and retrieval #18

Closed

Sentinel 1 input spec and retrieval #19

Closed

DEM specs and retrieval #20

Closed

Re-sampling strategy #21

Closed

weiji14 added the data-pipeline Pull Requests about the data pipeline label Nov 20, 2023

weiji14 mentioned this pull request Dec 6, 2023

Send early sample of embeddings #35

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Datacube #27

Datacube #27

lillythomas commented Nov 10, 2023

srmsoumya left a comment

srmsoumya Nov 10, 2023

lillythomas Nov 10, 2023

srmsoumya Nov 10, 2023

lillythomas Nov 10, 2023

srmsoumya Nov 10, 2023

lillythomas Nov 10, 2023

weiji14 Nov 13, 2023

srmsoumya Nov 13, 2023

lillythomas Nov 14, 2023

srmsoumya Nov 10, 2023

lillythomas Nov 14, 2023

weiji14 Nov 13, 2023

lillythomas Nov 14, 2023

weiji14 Nov 14, 2023

weiji14 Nov 14, 2023

weiji14 Nov 13, 2023

srmsoumya Nov 13, 2023

lillythomas Nov 14, 2023

lillythomas Nov 14, 2023

weiji14 Nov 13, 2023

weiji14 Nov 14, 2023

lillythomas commented Nov 14, 2023

weiji14 left a comment

weiji14 left a comment

		return da_merge


		def process(year1, year2, aoi, resolution, epsg):

Datacube #27

Datacube #27

Conversation

lillythomas commented Nov 10, 2023

srmsoumya left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lillythomas commented Nov 14, 2023

weiji14 left a comment

Choose a reason for hiding this comment

weiji14 left a comment

Choose a reason for hiding this comment