This notebook is meant to be a minimal example of the thread-locking issue when reading the HRRR Zarr data directly from S3. Specifically, I want to be able to see if it replicates with other versions.

In [14]:
%%time
import s3fs 
import xarray as xr

s3 = s3fs.S3FileSystem(anon=True)
def lookup(path):
    return s3fs.S3Map(path, s3=s3)

path_forecast = "hrrrzarr/sfc/20211124/20211124_00z_fcst.zarr/surface/PRES"
ds_from_s3 = xr.open_zarr(lookup(f"{path_forecast}/surface")) 
_ = ds_from_s3.PRES.values

CPU times: user 1.81 s, sys: 540 ms, total: 2.35 s
Wall time: 3.81 s


Confirm that we have all the data downloaded:

In [15]:
def ds_size_mb(ds):
    return ds.nbytes/1024**2

ds_size_mb(ds_from_s3)

348.84173583984375

Now run the slow calculation

In [31]:
%%time
%%prun -l 2

_ = ds_from_s3.PRES.mean(dim="time").values

 CPU times: user 1.72 s, sys: 382 ms, total: 2.1 s
Wall time: 3.4 s


         55587 function calls (54174 primitive calls) in 3.391 seconds

   Ordered by: internal time
   List reduced from 542 to 2 due to restriction <2>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1388    3.329    0.002    3.329    0.002 {method 'acquire' of '_thread.lock' objects}
      193    0.007    0.000    0.007    0.000 local.py:237(release_data)

Confirm no more data was downloaded

In [18]:
ds_size_mb(ds_from_s3)

348.84173583984375

Now, we download the same data locally.

In [None]:
# ! aws s3 cp s3://hrrrzarr/sfc/20211124/20211124_00z_fcst.zarr/surface/PRES/surface downloaded.zarr --no-sign-request --recursive

In [19]:
ds_from_local = xr.open_zarr("downloaded.zarr")

In [20]:
ds_size_mb(ds_from_local)

348.84173583984375

In [32]:
%%time
%%prun -l 2

_ = ds_from_local.PRES.mean(dim="time").values

 CPU times: user 954 ms, sys: 299 ms, total: 1.25 s
Wall time: 351 ms


         54279 function calls (52866 primitive calls) in 0.343 seconds

   Ordered by: internal time
   List reduced from 539 to 2 due to restriction <2>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      988    0.269    0.000    0.269    0.000 {method 'acquire' of '_thread.lock' objects}
      193    0.018    0.000    0.018    0.000 local.py:237(release_data)

In [28]:
import dask
import zarr
import platform
print(f"Python: {platform.python_version()}")
print(f"xarray: {xr.__version__}")
print(f"dask: {dask.__version__}")
print(f"zarr: {zarr.__version__}")
print(f"s3fs: {s3fs.__version__}")

Python: 3.10.0
xarray: 0.20.1
dask: 2021.11.2
zarr: 2.10.3
s3fs: 2021.11.0


In [33]:
xr.show_versions()


INSTALLED VERSIONS
------------------
commit: None
python: 3.10.0 | packaged by conda-forge | (default, Nov 20 2021, 02:25:38) [Clang 11.1.0 ]
python-bits: 64
OS: Darwin
OS-release: 18.7.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: None
LOCALE: (None, 'UTF-8')
libhdf5: None
libnetcdf: None

xarray: 0.20.1
pandas: 1.3.4
numpy: 1.21.4
scipy: None
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: 2.10.3
cftime: None
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2021.11.2
distributed: 2021.11.2
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
fsspec: 2021.11.0
cupy: None
pint: None
sparse: None
setuptools: 59.2.0
pip: 21.3.1
conda: None
pytest: None
IPython: 7.29.0
sphinx: None
