# Analyzing MUR SST with Coiled 

The Multi-Scale Ultra High Resolution (MUR) Sea Surface Temperature (SST) dataset consists of global, 1 km, daily data and is part of the [AWS Public dataset program](https://registry.opendata.aws/mur/). This is a very large dataset and the analyses below can take up to 5-10 minutes. 

This notebook demostrates how to:

- Create a Dask cluster with Coiled
- Load a terrabyte-scale dataset hosted on AWS S3
- Use Xarray to perform calculations

# Start up a Dask cluster with Coiled

In [None]:
import coiled

cluster = coiled.Cluster(n_workers=30, configuration="coiled-examples/pangeo")

In [None]:
from dask.distributed import Client

client = Client(cluster)
client

#### ☝️ Don’t forget to click the "Dashboard" link above to view the cluster dashboard!

# Opening the data

In [None]:
import warnings
import numpy as np
import pandas as pd
import xarray as xr
import fsspec

warnings.simplefilter('ignore') # filter some warning messages
xr.set_options(display_style="html")  #display dataset nicely 

Note: Some shortcomings in the s3fs and zarr formats have been identified.  To work on these, git issues were raised to the developers [here](https://github.com/dask/s3fs/issues/285) and [here](https://github.com/zarr-developers/zarr-python/issues/536). Currently accessing the complete metadata take several minutes.

In [None]:
%%time

ds_sst = xr.open_zarr(fsspec.get_mapper("s3://mur-sst/zarr", anon=True), consolidated=True)
ds_sst

This is a _very_ large dataset at over 66TB

In [None]:
ds_sst.nbytes / 1e12

# Data filtering

- The ice mask used by MUR SST is from NSIDC and is based on satellite passive microwave estimates of sea ice concentration
- The satellite data isn't available near land, so the is no estimate of sea ice concentration near land
- For this data, it means that there are some erroneous SSTs near land, that is likely ice and this is something to be aware of

In [None]:
sst = ds_sst['analysed_sst']
cond = (ds_sst.mask == 1) & ((ds_sst.sea_ice_fraction < .15) | np.isnan(ds_sst.sea_ice_fraction))
sst_masked = ds_sst['analysed_sst'].where(cond)
sst_masked

# Computation and plotting

Let's use ``.resample`` and ``.mean`` to determine the average monthly SST

In [None]:
sst_monthly = sst_masked.resample(time="1MS").mean("time", keep_attrs=True, skipna=False)

We can then compute and plot the SST timeseries from 2017-2020 in the Pacific Blob region

In [None]:
%%time
monthly_timeseries = sst_monthly.sel(lon=-140, 
                                     lat=53,
                                     time=slice('2017-01-01','2020-01-01'))

monthly_timeseries.plot();