# Example usage of the CMIP6 Object Store on JASMIN

## What is the CMIP6 Object Store on JASMIN?

We have copied a subset of high-priority CMIP6 data from our NetCDF archive to our local Object Store on JASMIN. The files are stored in Zarr format instead of NetCDF.

Users who are running on JASMIN (in SSH terminal or notebook environments) can read directly from the object store.

## Do you need any special libraries?

Yes, you will need to install `s3fs` and `zarr`, which can be added via a virtual 
environment and `pip`.

In [1]:
# Import the required packages
import virtualenv
import pip
import os

# Define and create the base directory install virtual environments
venvs_dir = os.path.join(os.path.expanduser("~"), "nb-venvs")

if not os.path.isdir(venvs_dir):
    os.makedirs(venvs_dir)
    
# Define the venv directory
venv_dir = os.path.join(venvs_dir, 'venv-notebook')

In [2]:
# Create the virtual environment
if not os.path.isdir(venv_dir):
    virtualenv.create_environment(venv_dir)

In [3]:
# Activate the venv
activate_file = os.path.join(venv_dir, "bin", "activate_this.py")
exec(open(activate_file).read(), dict(__file__=activate_file))

In [20]:
# pip install a package using the venv as a prefix
pip.main(["install", "--prefix", venv_dir, "s3fs"])
pip.main(["install", "--prefix", venv_dir, "zarr"])
pip.main(["install", "--prefix", venv_dir, "--upgrade", "xarray"])

Please see https://github.com/pypa/pip/issues/5599 for advice on fixing the underlying issue.
To avoid this problem you can invoke Python with '-m pip' instead of running pip directly.


Collecting xarray
  Using cached xarray-0.16.1-py3-none-any.whl (720 kB)
Installing collected packages: xarray
  Attempting uninstall: xarray
    Found existing installation: xarray 0.11.0




    Not uninstalling xarray at /opt/jaspy/lib/python3.7/site-packages, outside environment /home/users/astephen/nb-venvs/venv-notebook
    Can't uninstall 'xarray'. No files were found to uninstall.
Successfully installed xarray-0.16.1


You should consider upgrading via the '/opt/jaspy/bin/python -m pip install --upgrade pip' command.


0

## How to read from the object store

Let's start with a dataset in the archive, and map it to an object store URL.

In [4]:
from urllib.parse import urlparse

# In the CEDA archive we have:
archive_path = ("/badc/cmip6/data/CMIP6/AerChemMIP/NIMS-KMA/UKESM1-0-LL/"
                "hist-piNTCF/r3i1p1f2/Amon/evspsbl/gn/v20200224")

# Work out the Zarr info from that:
def map_archive_path(archive_path):
    "Returns zarr path, derived from archive path."
    scheme = "http://cmip6-zarr-o.s3.jc.rl.ac.uk"
    items = archive_path.replace("/badc/cmip6/data", "").strip("/").split("/")
    i_start = ".".join(items[:4])
    i_end = ".".join(items[4:]) + ".zarr"
    
    return f"{scheme}/{i_start}/{i_end}"

zarr_url = map_archive_path(archive_path)
print(zarr_url)

http://cmip6-zarr-o.s3.jc.rl.ac.uk/CMIP6.AerChemMIP.NIMS-KMA.UKESM1-0-LL/hist-piNTCF.r3i1p1f2.Amon.evspsbl.gn.v20200224.zarr


In [5]:
# Get the components of the URL, and set variables
url_comps = urlparse(zarr_url)
endpoint = f"{url_comps.scheme}://{url_comps.netloc}"
zarr_path = url_comps.path

In [6]:
# Now set up an S3FileSystem object to connect to the Zarr Store
import s3fs
import xarray as xr

jasmin_s3 = s3fs.S3FileSystem(anon=True, 
                              client_kwargs={"endpoint_url": endpoint})
s3_store = s3fs.S3Map(root=zarr_path, s3=jasmin_s3)

# And open the file directly with xarray
ds = xr.open_zarr(store=s3_store, consolidated=True)

And we can operate on the subset:

In [7]:
variable = ds.evspsbl

print(f'Subset shape: {variable.shape}')
print(f'Min, max: {float(variable.min())}, {float(variable.max())}')
print(f'Units: {variable.units}')

Subset shape: (1980, 144, 192)
Min, max: -2.2633825210505165e-05, 0.00022682354028802365
Units: kg m-2 s-1
