# Sentinel-1 Data Audit MSPC

Starting from the Parquet listing of all scenes, filter to a BBOX over Australia and then export counts of available scenes per year.

The [Sentinel-1 GRD metadata is here](https://planetarycomputer.microsoft.com/dataset/sentinel-1-grd).

There's a parquet file available at `abfs://items/sentinel-1-grd.parquet`

In [1]:
import dask.dataframe as dd
import dask_geopandas
import geopandas as gpd
from planetary_computer import sign_inplace
from pystac_client import Client
from shapely.geometry import box

from utils import bbox

In [None]:
catalog = Client.open(
    "https://planetarycomputer.microsoft.com/api/stac/v1/",
    modifier=sign_inplace,
)

asset = catalog.get_collection("sentinel-1-grd").assets["geoparquet-items"]

s1grd = dd.read_parquet(
    asset.href, storage_options=asset.extra_fields["table:storage_options"]
)
s1grd.head()

In [None]:
# Create a geopandas dataframe with the bounding box
gdf = gpd.GeoDataFrame(geometry=[box(*bbox)], crs="EPSG:4326")
gdf.explore()

In [None]:
# Filter with bounding box values
# This is "intersects" logic, so max of the scene box
# can be within the bounding box and vice versa
filtered = s1grd.loc[
    (s1grd.bbox.str[2] > bbox[0]) &
    (s1grd.bbox.str[3] > bbox[1]) &
    (s1grd.bbox.str[0] < bbox[2]) &
    (s1grd.bbox.str[1] < bbox[3])
]
filtered

In [11]:
count_per_year = filtered.groupby(filtered.datetime.dt.year).size().compute()
count_per_year

datetime
2014      259
2015     2328
2016     4142
2017     9342
2018     9524
2019    10299
2020    10347
2021    10415
2022     7389
2023     4201
2024      139
dtype: int64