## Example notebook
Reading in Spatio-Temporal Asset Catalogs (STAC) and performing zonal statistics on target areas through time.

Firstly we import the modules we need for this notebook to work. Run `pip install -r requirements.txt` from the root directory if you need to.

In [1]:
1+1

2

In [3]:
import geopandas as gpd
import sys
sys.path.append("..") # this is only required when the imports are a level above the current file, typically not required
import utilities
import zonal_statistics

  from .autonotebook import tqdm as notebook_tqdm


Have a quick look at the geopackage the holds our polygon layer.

In [13]:
display(utilities.list_all_layers_in_geopackage('../inputs/h3.gpkg'))

['h3_level10',
 'h3_level8',
 'h3_level7',
 'h3_level6',
 'h3_level5',
 'h3',
 'h3_elliott_river']

Then we identify our target polygons, these will form the underlying rows of our analysis.

In [19]:
gdf = gpd.read_file('../inputs/h3.gpkg', layer='h3_elliott_river')
gdf.head()

Unnamed: 0,GRID_ID,level,within0km,within10km,MERGE_SRC,geometry
0,85be8b63fffffff,5,1,,Y:\Projects\Ben\ForestHealth\ForestHealth.gdb\...,"MULTIPOLYGON (((16956166.438 -2875341.283, 169..."
1,85be8b73fffffff,5,1,,Y:\Projects\Ben\ForestHealth\ForestHealth.gdb\...,"MULTIPOLYGON (((16970052.456 -2890108.298, 169..."
2,85be8b77fffffff,5,1,,Y:\Projects\Ben\ForestHealth\ForestHealth.gdb\...,"MULTIPOLYGON (((16950083.811 -2894649.742, 169..."
3,86be8b607ffffff,6,1,,Y:\Projects\Ben\ForestHealth\ForestHealth.gdb\...,"MULTIPOLYGON (((16948489.353 -2874524.713, 169..."
4,86be8b627ffffff,6,1,,Y:\Projects\Ben\ForestHealth\ForestHealth.gdb\...,"MULTIPOLYGON (((16951580.613 -2881500.752, 169..."


In [25]:
gdf[gdf['level'] == 8]

Unnamed: 0,GRID_ID,level,within0km,within10km,MERGE_SRC,geometry
50,88be8b6041fffff,8,1,,Y:\Projects\Ben\ForestHealth\ForestHealth.gdb\...,"MULTIPOLYGON (((16947392.757 -2874407.752, 169..."
51,88be8b6045fffff,8,1,,Y:\Projects\Ben\ForestHealth\ForestHealth.gdb\...,"MULTIPOLYGON (((16948048.067 -2873528.549, 169..."
52,88be8b6049fffff,8,1,,Y:\Projects\Ben\ForestHealth\ForestHealth.gdb\...,"MULTIPOLYGON (((16947834.017 -2875404.034, 169..."
53,88be8b604bfffff,8,1,,Y:\Projects\Ben\ForestHealth\ForestHealth.gdb\...,"MULTIPOLYGON (((16946737.347 -2875287.01, 1694..."
54,88be8b604dfffff,8,1,,Y:\Projects\Ben\ForestHealth\ForestHealth.gdb\...,"MULTIPOLYGON (((16948489.353 -2874524.713, 169..."
...,...,...,...,...,...,...
188,88be8b7761fffff,8,1,,Y:\Projects\Ben\ForestHealth\ForestHealth.gdb\...,"MULTIPOLYGON (((16949855.677 -2891771.928, 169..."
189,88be8b7763fffff,8,1,,Y:\Projects\Ben\ForestHealth\ForestHealth.gdb\...,"MULTIPOLYGON (((16948757.355 -2891654.761, 169..."
190,88be8b7765fffff,8,1,,Y:\Projects\Ben\ForestHealth\ForestHealth.gdb\...,"MULTIPOLYGON (((16950511.828 -2890890.925, 169..."
191,88be8b7767fffff,8,1,,Y:\Projects\Ben\ForestHealth\ForestHealth.gdb\...,"MULTIPOLYGON (((16949413.58 -2890773.821, 1694..."


Excellent! We can see our features and now we are ready to analyse satellite data in these areas. Firstly lets define the target STAC dataset we are looking for. [../resources.json](../resources.json) is a file made by Ben Ross that just defines a few resources. You can certainly modify these attributes are you please.

In [7]:
# This fetches the STAC API URL from the resource metadata.
url = utilities.fetch_resource_metadata("../resources.json")['url']

# This fetches the name of the first sensor defined in the resource metadata.
sensor_name = utilities.fetch_resource_metadata("../resources.json")['sensors'][1]['name']

# This defines which bands to fetch from the STAC API based on the first sensor's band definitions.
bands = list(utilities.fetch_resource_metadata("../resources.json")['sensors'][1]['bands'][0].values())

# Bounds must be in EPSG 4326 for the STAC API search.
bounds = gdf.to_crs('EPSG:4326').total_bounds.tolist()

In [9]:
utilities.fetch_resource_metadata("../resources.json")['sensors'][1]['common_name']

'Sentinel-2B'

Now that we know what we are searching for lets generate the virtual array the contains the data by conducting a search.

In [6]:
# This searches for data within the bounding box of the gdf and within the specified time range.
data = utilities.get_data_from_stac(url, bounds, sensor_name, bands, time_range="2025-01-01/2025-12-31")

Unless we want to download data for each and every day lets resample the data to a monthly median so there are less rows to download and calculate.

In [7]:
# This resamples the fetched data to monthly frequency.
data_monthly = utilities.resample_stac_data_to_data_monthly(data)

Now all we have to do is run the primary `zonalStatistics.compute_zonal_stats_bands()` function and then we have our results as a list of csv's in a folder.

In [None]:
zonalStatistics.compute_zonal_stats_bands_vectorized(
    data_monthly=data_monthly,
    gdf=gdf,
    key_column_name='GRID_ID',
    bands=bands,
    output_dir="./example_outputs",
    overwrite=True)



Processing 12 time steps for 505 features
Bands: ['nbart_blue', 'nbart_green', 'nbart_red', 'nbart_nir_1', 'nbart_swir_2', 'nbart_swir_3']


  dest = _reproject(
  dest = _reproject(
100%|██████████| 12/12 [13:15<00:00, 66.31s/it]


Complete. Processed: 6060, Errors: 0, No data: 0, Skipped: 0


Now that all of our data is downloaded and calculated lets combine the files into a single large file which is much easier to work with.

In [9]:
import combineCSV

In [10]:
combineCSV.compile_csvs(
    output_dir="./example_outputs",
    pattern="BANDS*.csv",
    combined_filename="combined.csv",
    key_column_name='GRID_ID',
    recursive=False,
    verbose=False)