# Assessment of MUR SST Chunking and Codecs

This notebook provides an analysis of the different [MUR SST](https://podaac.jpl.nasa.gov/dataset/MUR-JPL-L4-GLOB-v4.1) chunk shapes and codecs for given time periods. As Zarr does not currently support arrays with variation in chunk shape, dimension and codecs, this notebook serves as reporting on why a virtual Zarr store must contain some native Zarr data to maintain consistent chunk shape and codecs.

## Load Libraries

In [1]:
from concurrent.futures import ThreadPoolExecutor
import utils

import warnings
warnings.filterwarnings("ignore", category=UserWarning)

variables = ['analysed_sst', 'analysis_error', 'mask', 'sea_ice_fraction']

## List DMRPP files + Open with VirtualiZarr

These files will be opened with VirtualiZarr's `open_virtual_dataset` which will subsequently be used to extract chunk shape and codecs from each file.

In [2]:
files = utils.list_mur_sst_files("2002-06-01", "2025-04-25")

In [3]:
%%time
with ThreadPoolExecutor() as executor:
    vdss = list(executor.map(utils.open_virtual_dmrpp, files))

CPU times: user 4min 9s, sys: 5.15 s, total: 4min 14s
Wall time: 5min


## Extract periods for which different chunk shapes are used for each variable

In [5]:
chunk_shapes_dict = utils.process_variable_metadata(vdss, variables, 'chunks')

In [7]:
utils.convert_dict_to_df(chunk_shapes_dict, 'chunk shape')

Unnamed: 0_level_0,Unnamed: 1_level_0,date_ranges
variable,chunk shape,Unnamed: 2_level_1
analysed_sst,"(1, 1023, 2047)","[(2002-06-01, 2023-02-23), (2023-03-01, 2023-04-21), (2023-04-23, 2023-09-03), (2024-03-24, 2024-03-24), (2024-06-02, 2025-04-25)]"
analysed_sst,"(1, 3600, 7200)","[(2023-02-24, 2023-02-28), (2023-04-22, 2023-04-22), (2023-09-04, 2024-03-23), (2024-03-25, 2024-06-01)]"
analysis_error,"(1, 1023, 2047)","[(2002-06-01, 2023-02-23), (2023-03-01, 2023-04-21), (2023-04-23, 2023-09-03), (2024-06-02, 2025-04-25)]"
analysis_error,"(1, 3600, 7200)","[(2023-02-24, 2023-02-28), (2023-04-22, 2023-04-22), (2023-09-04, 2024-06-01)]"
mask,"(1, 1447, 2895)","[(2002-06-01, 2021-02-19), (2021-02-22, 2022-11-08), (2022-11-10, 2023-02-23), (2023-03-01, 2023-04-21), (2023-04-23, 2023-09-03)]"
mask,"(1, 1023, 2047)","[(2021-02-20, 2021-02-21), (2022-11-09, 2022-11-09), (2024-06-02, 2025-04-25)]"
mask,"(1, 4500, 9000)","[(2023-02-24, 2023-02-28), (2023-04-22, 2023-04-22), (2023-09-04, 2024-06-01)]"
sea_ice_fraction,"(1, 1447, 2895)","[(2002-06-01, 2021-02-19), (2021-02-22, 2022-11-08), (2022-11-10, 2023-02-23), (2023-03-01, 2023-04-21), (2023-04-23, 2023-09-03)]"
sea_ice_fraction,"(1, 1023, 2047)","[(2021-02-20, 2021-02-21), (2022-11-09, 2022-11-09), (2024-06-02, 2025-04-25)]"
sea_ice_fraction,"(1, 4500, 9000)","[(2023-02-24, 2023-02-28), (2023-04-22, 2023-04-22), (2023-09-04, 2024-06-01)]"


## Extract periods for which different codecs are used for each variable

In [23]:
codecs_dict = utils.process_variable_metadata(vdss, variables, 'codecs')

In [24]:
utils.convert_dict_to_df(codecs_dict, 'codecs')

Unnamed: 0_level_0,Unnamed: 1_level_0,date_ranges
variable,codecs,Unnamed: 2_level_1
analysed_sst,"[{'name': 'numcodecs.shuffle', 'configuration': {'elementsize': 2}}, {'name': 'numcodecs.zlib', 'configuration': {'level': 6}}]","[(2002-06-01, 2003-09-10), (2003-09-12, 2021-02-19), (2021-02-22, 2021-12-23), (2022-01-27, 2022-11-08), (2022-11-10, 2024-03-05), (2024-03-07, 2024-03-17), (2024-03-19, 2024-05-11)]"
analysed_sst,"[{'name': 'numcodecs.zlib', 'configuration': {'level': 6}}]","[(2003-09-11, 2003-09-11)]"
analysed_sst,"[{'name': 'numcodecs.shuffle', 'configuration': {'elementsize': 2}}, {'name': 'numcodecs.zlib', 'configuration': {'level': 7}}]","[(2021-02-20, 2021-02-21), (2022-11-09, 2022-11-09), (2024-03-06, 2024-03-06), (2024-03-18, 2024-03-18), (2024-05-12, 2025-04-25)]"
analysed_sst,"[{'name': 'numcodecs.zlib', 'configuration': {'level': 6}}, {'name': 'numcodecs.shuffle', 'configuration': {'elementsize': 2}}]","[(2021-12-24, 2022-01-26)]"
analysis_error,"[{'name': 'numcodecs.shuffle', 'configuration': {'elementsize': 2}}, {'name': 'numcodecs.zlib', 'configuration': {'level': 6}}]","[(2002-06-01, 2003-09-10), (2003-09-12, 2021-02-19), (2021-02-22, 2021-12-23), (2022-01-27, 2022-11-08), (2022-11-10, 2024-03-05), (2024-03-07, 2024-03-17), (2024-03-19, 2024-05-11)]"
analysis_error,"[{'name': 'numcodecs.zlib', 'configuration': {'level': 6}}]","[(2003-09-11, 2003-09-11)]"
analysis_error,"[{'name': 'numcodecs.shuffle', 'configuration': {'elementsize': 2}}, {'name': 'numcodecs.zlib', 'configuration': {'level': 7}}]","[(2021-02-20, 2021-02-21), (2022-11-09, 2022-11-09), (2024-03-06, 2024-03-06), (2024-03-18, 2024-03-18), (2024-05-12, 2025-04-25)]"
analysis_error,"[{'name': 'numcodecs.zlib', 'configuration': {'level': 6}}, {'name': 'numcodecs.shuffle', 'configuration': {'elementsize': 2}}]","[(2021-12-24, 2022-01-26)]"
mask,"[{'name': 'numcodecs.shuffle', 'configuration': {'elementsize': 1}}, {'name': 'numcodecs.zlib', 'configuration': {'level': 6}}]","[(2002-06-01, 2003-09-10), (2003-09-12, 2021-02-19), (2021-02-22, 2021-12-23), (2022-01-27, 2022-11-08), (2022-11-10, 2024-03-05), (2024-03-07, 2024-03-17), (2024-03-19, 2024-05-11)]"
mask,"[{'name': 'numcodecs.zlib', 'configuration': {'level': 6}}]","[(2003-09-11, 2003-09-11)]"
