# Working with Sea Surface Temperature Timeseries from STAC Parquet

This notebook follows on from the previous notebooks by demonstrating how to load a data product that has been produced using cloud native methodologies.

In the previous two examples, we used STAC APIs to access the data.
For smaller datasets, it's possible for a data provider to instead provide a static Parquet file that acts as an index of all the STAC items, allowing for end-users to query the data without the provider needing to maintain a server.

In this notebook, we'll access the [Global Foundation Sea Surface Temperature Analysis dataset](https://podaac.jpl.nasa.gov/dataset/MUR-JPL-L4-GLOB-v4.1).
The dataset is created and distributed by NASA's Jet Propulsion Laboratory in collaboration with the Group for High Resolution Sea Surface Temperature.
The Australian Antarctic Division has [worked to convert the data into Cloud Optimised GeoTIFFs](https://github.com/AustralianAntarcticDivision/ghrsst-cogger/tree/main) and enable access to them through a STAC Parquet file, which is hosted by [Source Cooperative](https://source.coop/repositories/ausantarctic/ghrsst-mur-v2/description).


## Set up

The first step is to set up the required Python libraries.

* `stacrs` is used to query the parquet file housing the index of the sea surface temperature data
* `pystac` is used to create a list of STAC items
* `odc.stac` is used to load the list of items
* `odc.geo` allows us to create a bounding box for the STAC query

In [None]:
import stacrs
import pystac
from odc.stac import load
from odc.geo import BoundingBox

The second step is to start a Dask client.

Dask supports local parallel processing and can help speed up computation times.

In [None]:
from dask.distributed import Client as DaskClient

dask_client = DaskClient(n_workers=2, threads_per_worker=16)
dask_client

## Part 1: Find

### 1.1. Setting the source URL for the STAC Parquet file

Rather than providing a URL to an API and using `pystac-client` to make a connection, we can specify the URL to the Parquet file that we wish to read from.

In [None]:
sst_parquet_url = "https://data.source.coop/ausantarctic/ghrsst-mur-v2/ghrsst-mur-v2.parquet"

### 1.2 Identifying all STAC items from the Parquet file

This dataset is relatively small, and all items can be read into a list.
This list contains the information that `odc-stac` will use to load data from the Cloud Optimised GeoTIFFs described by the STAC items.

In [None]:
items = await stacrs.read(sst_parquet_url)
items = [pystac.Item.from_dict(i) for i in items["features"]]

print(f"Found {len(items)} items")

### 1.3 Selecting an area to load

The sea surface temperature data is available globally.
For this notebook, we will look at Prydz Bay in Antarctica, a region that houses many scientific research stations.

In [None]:
longitude_center = 75.19
longitude_buffer = 5

latitude_center = -68.99
latitude_buffer = 1.5


bbox = BoundingBox(
    left=longitude_center-longitude_buffer,
    bottom=latitude_center-latitude_buffer,
    right=longitude_center+longitude_buffer,
    top=latitude_center+latitude_buffer,
    crs="EPSG:4326",
)

bbox.explore()

## Part 2: Load

### 2.1 Using odc-stac to load data that intersects with the bounding box

In [None]:
data = load(
    items,
    bbox=bbox,
    chunks={},
    measurements=["analysed_sst"],
    fail_on_error=False,
)
data

### 2.2 Extract metadata
For each STAC item, it's possible to extract additional metadata that's stored in the `raster:bands` extension. 
The additional metadata are:
* the nodata value
* the data type
* the scale
* the offset
* the unit

In [None]:
sst_metadata_dict = items[0].assets["analysed_sst"].extra_fields["raster:bands"][0]
sst_metadata_dict

Loading the metadata shows that the original product is provided in Kelvin units, and that it has been stored as an integer using a scale and offset value.

## Part 3: Visualise

### 3.1 Prepare data for visualisation

There are three important steps we can do to get the data ready for visualisation:
* Mask any nodata values
* Apply the scale and offset so we're working with float values in Kelvin
* Convert the values in Kelvin to values in Celcius

In [None]:
# Mask nodata
sst = data.analysed_sst.where(data.analysed_sst != sst_metadata_dict["nodata"])

In [None]:
# Extract scale and offset values, then apply
sst_scale = sst_metadata_dict["scale"]
sst_offset = sst_metadata_dict["offset"]

sst_scaled_kelvin = (sst * sst_scale) + sst_offset

In [None]:
# Define conversion factor to go from Kelvin to Celcius, then apply
k_to_c = -273.15

sst_scaled_celcius = sst_scaled_kelvin + k_to_c

### 3.2 Visualise a single date

It's often valuable to review a single image to understand if the masking and scaling has been applied

In [None]:
sst_scaled_celcius.isel(time=-1).plot.imshow()

### 3.3 Calculate and plot the monthly maximum temperature over time

For the next visualisations, we'll select a year of data and calculate the monthly maximum for each pixel. 
This allows us to see monthly patterns, and then further summarise to see an overall maximum over several years.

In [None]:
# Select five years of data
sst_subset = sst_scaled_celcius.sel(time=slice("2020-01-01", "2024-12-31"))

# Calculate the max value of each pixel for each month. Apply .compute() to keep in memory for visualisation
monthly_max = sst_subset.resample(time="1ME").max().compute()

#### 3.3.1 Pixel-level maximum for each month in 2020

In [None]:
monthly_max.sel(time=slice("2020-01-01", "2020-12-31")).plot.imshow(col="time", col_wrap=6)

#### 3.3.1 Monthly overall maximum for five years

In [None]:
monthly_max.max(dim=["longitude", "latitude"]).plot.line()

## Part 4: Tidy up

In this section, we close the Dask client.
This prevents multiple clients being instantiated when using different notebooks.

In [None]:
dask_client.close()