# Catalog checks

This notebook is just to check things in the catalog.

job script options used for gdata storage/access:

```bash
"-l storage=gdata/xv83+gdata/gh0+gdata/oi10+gdata/dk92+gdata/hh5+gdata/rr3+gdata/al33+gdata/fs38+gdata/xp65+gdata/p66"
```

## 1. Load packages

In [1]:
# Ignore warnings
import os
from os import environ
environ["PYTHONWARNINGS"] = "ignore"

In [2]:
import glob

In [3]:
# Load intake and cosima cookbook
import intake

# Load numpy for numbers!
import numpy as np

# Load pandas for DataFrame manipulations
import pandas as pd

In [4]:
import xarray as xr


In [5]:
import datetime

In [6]:
import xmip

# Load xmip for preprocessing (trying to get consistent metadata for making matrices down the road)
from xmip.preprocessing import combined_preprocessing


In [7]:
ls /g/data/p66

[0m[36maccessdev-web[0m/  [36mct5255[0m/  [36mjk8585[0m/                [36mpath[0m/    [36msjr900[0m/
[36mACCESSDIR[0m/      [36mczl599[0m/  [36mjrc599[0m/                [36mpbd562[0m/  [36mslf563[0m/
[36mach599[0m/         [36mdei561[0m/  [36mjxs599[0m/                [36mpfu599[0m/  [36mspo599[0m/
[36mafm599[0m/         [36mdhb599[0m/  [36mkjl574[0m/                [36mpfv599[0m/  [36mtfl561[0m/
[36majd563[0m/         [36mfd0474[0m/  [36mkxd599[0m/                [36mpg6445[0m/  [36mtxz599[0m/
[36makl599[0m/         [36mfsg599[0m/  [32mlist_of_p66_users.txt[0m  [36mpjb581[0m/  [32musage_20240504.txt[0m
[36mars599[0m/         [36mglr548[0m/  [36mlxs599[0m/                [36mpju565[0m/  [32musage_ars599.txt[0m
[36mas7904[0m/         [36mhar599[0m/  [36mlz3062[0m/                [36mqh5472[0m/  [36mwgh581[0m/
[36maxl599[0m/         [36mhhn548[0m/  [36mmac599[0m/                [36mrb4844[0m/  [3

In [8]:
ls /g/data/p73/archive

[0m[36mCMIP6[0m/  [36mCMIP7-test[0m/  [36mmachine_learning[0m/  [36mnon-CMIP[0m/  [32mREADME.txt[0m  [36mstaging[0m/  [36mtmp[0m/


In [9]:
cat /g/data/p73/archive/README.txt

ACCESS GCM Model Output Archive

contact: chloe.mackallah@csiro.au

This purpose of this project is the mid- to long-term archiving of ACCESS GCM simulation data; 
and is a sibling to p66 (my.nci.org.au/mancini/project/p66). 
Primarily, this archive contains model output from ACCESS-CM2 and ACCESS-ESM1.5 
production runs for CMIP6, produced by CSIRO in collaboration with ARCCSS (i.e. CLEX), 
but also includes non-CMIP simulations, especially those used in publications.

Model output from the UM (atmosphere component) has been converted to netCDF format 
using Iris (older runs used cdms2), while output from MOM (ocean) and CICE (sea-ice) 
have been compressed. Restart files for every 10th simulations year are saved along 
with model output, while subdaily data is moved to NCI's massdata system.

For CF-compliant datasets from ACCESS CMIP6 production simulations, which are published 
through the ESGF for CMIP6, please see the local publication node, project fs38 
(my.nci.org.au/mancini/p

In [10]:
# Path to data
ACCESS_ESM_path = "/g/data/p73/archive/CMIP6/ACCESS-ESM1-5/HI-05/history/ocn"
ACCESS_CM_path = "/g/data/p73/archive/CMIP6/ACCESS-ESM1-5/HI-05/history/ocn"

In [11]:
ACCESS_ESM_monthly_files = np.sort(glob.glob(f'{ACCESS_ESM_path}/*month*'))
ACCESS_ESM_monthly_files

array(['/g/data/p73/archive/CMIP6/ACCESS-ESM1-5/HI-05/history/ocn/ocean_month.nc-18501231',
       '/g/data/p73/archive/CMIP6/ACCESS-ESM1-5/HI-05/history/ocn/ocean_month.nc-18511231',
       '/g/data/p73/archive/CMIP6/ACCESS-ESM1-5/HI-05/history/ocn/ocean_month.nc-18521231',
       '/g/data/p73/archive/CMIP6/ACCESS-ESM1-5/HI-05/history/ocn/ocean_month.nc-18531231',
       '/g/data/p73/archive/CMIP6/ACCESS-ESM1-5/HI-05/history/ocn/ocean_month.nc-18541231',
       '/g/data/p73/archive/CMIP6/ACCESS-ESM1-5/HI-05/history/ocn/ocean_month.nc-18551231',
       '/g/data/p73/archive/CMIP6/ACCESS-ESM1-5/HI-05/history/ocn/ocean_month.nc-18561231',
       '/g/data/p73/archive/CMIP6/ACCESS-ESM1-5/HI-05/history/ocn/ocean_month.nc-18571231',
       '/g/data/p73/archive/CMIP6/ACCESS-ESM1-5/HI-05/history/ocn/ocean_month.nc-18581231',
       '/g/data/p73/archive/CMIP6/ACCESS-ESM1-5/HI-05/history/ocn/ocean_month.nc-18591231',
       '/g/data/p73/archive/CMIP6/ACCESS-ESM1-5/HI-05/history/ocn/ocean_month.nc

In [13]:
foo = xr.open_dataset(ACCESS_ESM_monthly_files[0])
foo

## Check if umo == tx_trans and vmo == ty_trans

In [18]:

def find_latest_version(cat):
    """
    find latest version of selected data
    """
    sorted_versions = cat.df.version.to_list()
    sorted_versions.sort()
    latest_version = sorted_versions[-1]
    return latest_version

def select_latest_cat(cat, **kwargs):
    """
    search latest version of selected data
    """
    selectedcat = cat.search(**kwargs)
    # if dataframe is empty, error
    if selectedcat.df.empty:
        raise ValueError(f"No data found for {kwargs}")

    latestselectedcat = selectedcat.search(version=find_latest_version(selectedcat))
    return latestselectedcat

def select_latest_data(cat, xarray_open_kwargs, **kwargs):
    latestselectedcat = select_latest_cat(cat, **kwargs)
    print("\nlatestselectedcat: ", latestselectedcat)
    xarray_combine_by_coords_kwargs=dict(
        compat="override",
        data_vars="minimal",
        coords="minimal"
    )
    datadask = latestselectedcat.to_dask(
        xarray_open_kwargs=xarray_open_kwargs,
        xarray_combine_by_coords_kwargs=xarray_combine_by_coords_kwargs,
        parallel=True,
        preprocess=combined_preprocessing,
    )
    return datadask


In [16]:
catalogs = intake.cat.access_nri

In [17]:
cat_cmip6_Aus = catalogs["cmip6_fs38"]
cat_cmip6_Aus

Unnamed: 0,unique
path,1054133
file_type,2
realm,7
frequency,10
table_id,24
project_id,1
institution_id,3
source_id,4
experiment_id,52
member_id,80


In [72]:
searched_cat = cat_cmip6_Aus.search(
    source_id = "ACCESS-ESM1-5",
    experiment_id = "historical",
    variable_id = ["umo", "vmo"],
    realm = 'ocean')
print(searched_cat)


<cmip6-fs38 catalog with 160 dataset(s) from 2720 asset(s)>


In [73]:
umo_datadask = select_latest_data(searched_cat,
    dict(
        # chunks={'i': 60, 'j': 60, 'time': -1, 'lev':50}
    ),
    variable_id = "umo",
    member_id = "r2i1p1f1",
    frequency = "mon",
)
umo_datadask


latestselectedcat:  <cmip6-fs38 catalog with 1 dataset(s) from 17 asset(s)>


Unnamed: 0,Array,Chunk
Bytes,30.94 kiB,16 B
Shape,"(1980, 2)","(1, 2)"
Dask graph,1980 chunks in 35 graph layers,1980 chunks in 35 graph layers
Data type,datetime64[ns] numpy.ndarray,datetime64[ns] numpy.ndarray
"Array Chunk Bytes 30.94 kiB 16 B Shape (1980, 2) (1, 2) Dask graph 1980 chunks in 35 graph layers Data type datetime64[ns] numpy.ndarray",2  1980,

Unnamed: 0,Array,Chunk
Bytes,30.94 kiB,16 B
Shape,"(1980, 2)","(1, 2)"
Dask graph,1980 chunks in 35 graph layers,1980 chunks in 35 graph layers
Data type,datetime64[ns] numpy.ndarray,datetime64[ns] numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,800 B,800 B
Shape,"(50, 2)","(50, 2)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 800 B 800 B Shape (50, 2) (50, 2) Dask graph 1 chunks in 2 graph layers Data type float64 numpy.ndarray",2  50,

Unnamed: 0,Array,Chunk
Bytes,800 B,800 B
Shape,"(50, 2)","(50, 2)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,843.75 kiB,843.75 kiB
Shape,"(300, 360)","(300, 360)"
Dask graph,1 chunks in 5 graph layers,1 chunks in 5 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 843.75 kiB 843.75 kiB Shape (300, 360) (300, 360) Dask graph 1 chunks in 5 graph layers Data type float64 numpy.ndarray",360  300,

Unnamed: 0,Array,Chunk
Bytes,843.75 kiB,843.75 kiB
Shape,"(300, 360)","(300, 360)"
Dask graph,1 chunks in 5 graph layers,1 chunks in 5 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,843.75 kiB,843.75 kiB
Shape,"(300, 360)","(300, 360)"
Dask graph,1 chunks in 8 graph layers,1 chunks in 8 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 843.75 kiB 843.75 kiB Shape (300, 360) (300, 360) Dask graph 1 chunks in 8 graph layers Data type float64 numpy.ndarray",360  300,

Unnamed: 0,Array,Chunk
Bytes,843.75 kiB,843.75 kiB
Shape,"(300, 360)","(300, 360)"
Dask graph,1 chunks in 8 graph layers,1 chunks in 8 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,3.30 MiB,1.65 MiB
Shape,"(300, 360, 4)","(300, 360, 2)"
Dask graph,3 chunks in 3 graph layers,3 chunks in 3 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 3.30 MiB 1.65 MiB Shape (300, 360, 4) (300, 360, 2) Dask graph 3 chunks in 3 graph layers Data type float64 numpy.ndarray",4  360  300,

Unnamed: 0,Array,Chunk
Bytes,3.30 MiB,1.65 MiB
Shape,"(300, 360, 4)","(300, 360, 2)"
Dask graph,3 chunks in 3 graph layers,3 chunks in 3 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,3.30 MiB,1.65 MiB
Shape,"(300, 360, 4)","(300, 360, 2)"
Dask graph,3 chunks in 6 graph layers,3 chunks in 6 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 3.30 MiB 1.65 MiB Shape (300, 360, 4) (300, 360, 2) Dask graph 3 chunks in 6 graph layers Data type float64 numpy.ndarray",4  360  300,

Unnamed: 0,Array,Chunk
Bytes,3.30 MiB,1.65 MiB
Shape,"(300, 360, 4)","(300, 360, 2)"
Dask graph,3 chunks in 6 graph layers,3 chunks in 6 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,1.65 MiB,843.75 kiB
Shape,"(2, 300, 360)","(1, 300, 360)"
Dask graph,2 chunks in 15 graph layers,2 chunks in 15 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 1.65 MiB 843.75 kiB Shape (2, 300, 360) (1, 300, 360) Dask graph 2 chunks in 15 graph layers Data type float64 numpy.ndarray",360  300  2,

Unnamed: 0,Array,Chunk
Bytes,1.65 MiB,843.75 kiB
Shape,"(2, 300, 360)","(1, 300, 360)"
Dask graph,2 chunks in 15 graph layers,2 chunks in 15 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,1.65 MiB,843.75 kiB
Shape,"(2, 300, 360)","(1, 300, 360)"
Dask graph,2 chunks in 12 graph layers,2 chunks in 12 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 1.65 MiB 843.75 kiB Shape (2, 300, 360) (1, 300, 360) Dask graph 2 chunks in 12 graph layers Data type float64 numpy.ndarray",360  300  2,

Unnamed: 0,Array,Chunk
Bytes,1.65 MiB,843.75 kiB
Shape,"(2, 300, 360)","(1, 300, 360)"
Dask graph,2 chunks in 12 graph layers,2 chunks in 12 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,39.83 GiB,2.57 MiB
Shape,"(1980, 50, 300, 360)","(1, 25, 150, 180)"
Dask graph,15840 chunks in 35 graph layers,15840 chunks in 35 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 39.83 GiB 2.57 MiB Shape (1980, 50, 300, 360) (1, 25, 150, 180) Dask graph 15840 chunks in 35 graph layers Data type float32 numpy.ndarray",1980  1  360  300  50,

Unnamed: 0,Array,Chunk
Bytes,39.83 GiB,2.57 MiB
Shape,"(1980, 50, 300, 360)","(1, 25, 150, 180)"
Dask graph,15840 chunks in 35 graph layers,15840 chunks in 35 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray


In [74]:
umo = umo_datadask.isel(time=0).umo
umo

Unnamed: 0,Array,Chunk
Bytes,20.60 MiB,2.57 MiB
Shape,"(50, 300, 360)","(25, 150, 180)"
Dask graph,8 chunks in 36 graph layers,8 chunks in 36 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 20.60 MiB 2.57 MiB Shape (50, 300, 360) (25, 150, 180) Dask graph 8 chunks in 36 graph layers Data type float32 numpy.ndarray",360  300  50,

Unnamed: 0,Array,Chunk
Bytes,20.60 MiB,2.57 MiB
Shape,"(50, 300, 360)","(25, 150, 180)"
Dask graph,8 chunks in 36 graph layers,8 chunks in 36 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,843.75 kiB,843.75 kiB
Shape,"(300, 360)","(300, 360)"
Dask graph,1 chunks in 5 graph layers,1 chunks in 5 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 843.75 kiB 843.75 kiB Shape (300, 360) (300, 360) Dask graph 1 chunks in 5 graph layers Data type float64 numpy.ndarray",360  300,

Unnamed: 0,Array,Chunk
Bytes,843.75 kiB,843.75 kiB
Shape,"(300, 360)","(300, 360)"
Dask graph,1 chunks in 5 graph layers,1 chunks in 5 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,843.75 kiB,843.75 kiB
Shape,"(300, 360)","(300, 360)"
Dask graph,1 chunks in 8 graph layers,1 chunks in 8 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 843.75 kiB 843.75 kiB Shape (300, 360) (300, 360) Dask graph 1 chunks in 8 graph layers Data type float64 numpy.ndarray",360  300,

Unnamed: 0,Array,Chunk
Bytes,843.75 kiB,843.75 kiB
Shape,"(300, 360)","(300, 360)"
Dask graph,1 chunks in 8 graph layers,1 chunks in 8 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray


In [75]:
tx_trans = foo.tx_trans.isel(time=0)
tx_trans

In [76]:
import matplotlib as plt

In [77]:
xdiff = np.nanmean(tx_trans.data - umo.data)
xdiff

Unnamed: 0,Array,Chunk
Bytes,4 B,4 B
Shape,(),()
Dask graph,1 chunks in 41 graph layers,1 chunks in 41 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
Array Chunk Bytes 4 B 4 B Shape () () Dask graph 1 chunks in 41 graph layers Data type float32 numpy.ndarray,,

Unnamed: 0,Array,Chunk
Bytes,4 B,4 B
Shape,(),()
Dask graph,1 chunks in 41 graph layers,1 chunks in 41 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray


In [78]:
xdiff.compute()

-879491.25

In [17]:
year_start = 1990
num_years = 10

In [28]:
# Create directory on gdata
datadir = '/scratch/xv83/TMIP/data'
start_time, end_time = time_window_strings(year_start, num_years)
start_time_str = start_time.strftime("%b%Y")
end_time_str = end_time.strftime("%b%Y")

outputdir = f'{datadir}/{model}/{experiment}/{ensemble}/{start_time_str}-{end_time_str}'
print(outputdir)

/scratch/xv83/TMIP/data/ACCESS-ESM1-5/historical/r1i1p1f1/Jan1990-Dec1999


In [29]:
makedirs(outputdir, exist_ok=True)

In [30]:
########## Start the client and make the `.nc` files ##########
print("Starting client")
client = Client(n_workers=4)#, threads_per_worker=1, memory_limit='16GB') # Note: with 1thread/worker cannot plot thetao. Maybe I need to understand why?
client

Starting client


In [31]:
# umo dataset
print("Loading umo data")
umo_datadask = select_latest_data(searched_cat,
    dict(
        chunks={'i': 60, 'j': 60, 'time': -1, 'lev':50}
    ),
    variable_id = "umo",
    frequency = "mon",
)
print("\numo_datadask: ", umo_datadask)

Loading umo data

umo_datadask:  <xarray.Dataset> Size: 43GB
Dimensions:             (time: 1980, lev: 50, j: 300, i: 360, bnds: 2,
                         vertices: 4)
Coordinates:
  * time                (time) datetime64[ns] 16kB 1850-01-16T12:00:00 ... 20...
    time_bnds           (time, bnds) datetime64[ns] 32kB dask.array<chunksize=(120, 2), meta=np.ndarray>
  * lev                 (lev) float64 400B 5.0 15.0 25.0 ... 5.499e+03 5.831e+03
    lev_bnds            (lev, bnds) float64 800B dask.array<chunksize=(50, 2), meta=np.ndarray>
  * j                   (j) int32 1kB 0 1 2 3 4 5 6 ... 294 295 296 297 298 299
  * i                   (i) int32 1kB 0 1 2 3 4 5 6 ... 354 355 356 357 358 359
    latitude            (j, i) float64 864kB dask.array<chunksize=(60, 60), meta=np.ndarray>
    longitude           (j, i) float64 864kB dask.array<chunksize=(60, 60), meta=np.ndarray>
    vertices_latitude   (j, i, vertices) float64 3MB dask.array<chunksize=(60, 60, 2), meta=np.ndarray>
    

In [32]:
# vmo dataset
print("Loading vmo data")
vmo_datadask = select_latest_data(searched_cat,
    dict(
        chunks={'i': 60, 'j': 60, 'time': -1, 'lev':50}
    ),
    variable_id = "vmo",
    frequency = "mon",
)
print("\nvmo_datadask: ", vmo_datadask)

Loading vmo data

vmo_datadask:  <xarray.Dataset> Size: 43GB
Dimensions:             (time: 1980, lev: 50, j: 300, i: 360, bnds: 2,
                         vertices: 4)
Coordinates:
  * time                (time) datetime64[ns] 16kB 1850-01-16T12:00:00 ... 20...
    time_bnds           (time, bnds) datetime64[ns] 32kB dask.array<chunksize=(120, 2), meta=np.ndarray>
  * lev                 (lev) float64 400B 5.0 15.0 25.0 ... 5.499e+03 5.831e+03
    lev_bnds            (lev, bnds) float64 800B dask.array<chunksize=(50, 2), meta=np.ndarray>
  * j                   (j) int32 1kB 0 1 2 3 4 5 6 ... 294 295 296 297 298 299
  * i                   (i) int32 1kB 0 1 2 3 4 5 6 ... 354 355 356 357 358 359
    latitude            (j, i) float64 864kB dask.array<chunksize=(60, 60), meta=np.ndarray>
    longitude           (j, i) float64 864kB dask.array<chunksize=(60, 60), meta=np.ndarray>
    vertices_latitude   (j, i, vertices) float64 3MB dask.array<chunksize=(60, 60, 2), meta=np.ndarray>
    

In [33]:
# mlotst dataset
print("Loading mlotst data")
mlotst_datadask = select_latest_data(searched_cat,
    dict(
        chunks={'i': 60, 'j': 60, 'time': -1, 'lev':50}
    ),
    variable_id = "mlotst",
    frequency = "mon",
)
print("\nmlotst_datadask: ", mlotst_datadask)

Loading mlotst data

mlotst_datadask:  <xarray.Dataset> Size: 864MB
Dimensions:             (time: 1980, bnds: 2, j: 300, i: 360, vertices: 4)
Coordinates:
  * time                (time) datetime64[ns] 16kB 1850-01-16T12:00:00 ... 20...
  * j                   (j) int32 1kB 0 1 2 3 4 5 6 ... 294 295 296 297 298 299
  * i                   (i) int32 1kB 0 1 2 3 4 5 6 ... 354 355 356 357 358 359
    latitude            (j, i) float64 864kB dask.array<chunksize=(60, 60), meta=np.ndarray>
    longitude           (j, i) float64 864kB dask.array<chunksize=(60, 60), meta=np.ndarray>
Dimensions without coordinates: bnds, vertices
Data variables:
    time_bnds           (time, bnds) datetime64[ns] 32kB dask.array<chunksize=(1980, 2), meta=np.ndarray>
    vertices_latitude   (j, i, vertices) float64 3MB dask.array<chunksize=(60, 60, 2), meta=np.ndarray>
    vertices_longitude  (j, i, vertices) float64 3MB dask.array<chunksize=(60, 60, 2), meta=np.ndarray>
    mlotst              (time, j, i) flo

In [34]:
# Deal with thkcello for a different script,
# given that its location (fixed or time-dependent) depends on the model and/or project
# # thkcello dataset
# print("Loading thkcello data")
# thkcello_datadask = select_latest_data(searched_cat,
#     dict(
#         chunks={'i': 60, 'j': 60, 'time': -1, 'lev':50}
#     ),
#     variable_id = "thkcello",
#     frequency = "mon",
# )
# print("\nthkcello_datadask: ", thkcello_datadask)

In [35]:
# Slice umo dataset for the time period
umo_datadask_sel = umo_datadask.sel(time=slice(start_time, end_time))
# Take the time average of the monthly evaporation (using month length as weights)
umo = umo_datadask_sel["umo"].weighted(umo_datadask_sel.time.dt.days_in_month).mean(dim="time")
umo

Unnamed: 0,Array,Chunk
Bytes,41.20 MiB,1.37 MiB
Shape,"(50, 300, 360)","(50, 60, 60)"
Dask graph,30 chunks in 50 graph layers,30 chunks in 50 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 41.20 MiB 1.37 MiB Shape (50, 300, 360) (50, 60, 60) Dask graph 30 chunks in 50 graph layers Data type float64 numpy.ndarray",360  300  50,

Unnamed: 0,Array,Chunk
Bytes,41.20 MiB,1.37 MiB
Shape,"(50, 300, 360)","(50, 60, 60)"
Dask graph,30 chunks in 50 graph layers,30 chunks in 50 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,843.75 kiB,28.12 kiB
Shape,"(300, 360)","(60, 60)"
Dask graph,30 chunks in 2 graph layers,30 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 843.75 kiB 28.12 kiB Shape (300, 360) (60, 60) Dask graph 30 chunks in 2 graph layers Data type float64 numpy.ndarray",360  300,

Unnamed: 0,Array,Chunk
Bytes,843.75 kiB,28.12 kiB
Shape,"(300, 360)","(60, 60)"
Dask graph,30 chunks in 2 graph layers,30 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,843.75 kiB,28.12 kiB
Shape,"(300, 360)","(60, 60)"
Dask graph,30 chunks in 2 graph layers,30 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 843.75 kiB 28.12 kiB Shape (300, 360) (60, 60) Dask graph 30 chunks in 2 graph layers Data type float64 numpy.ndarray",360  300,

Unnamed: 0,Array,Chunk
Bytes,843.75 kiB,28.12 kiB
Shape,"(300, 360)","(60, 60)"
Dask graph,30 chunks in 2 graph layers,30 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray


In [36]:
# Slice vmo dataset for the time period
vmo_datadask_sel = vmo_datadask.sel(time=slice(start_time, end_time))
# Take the time average of the monthly evaporation (using month length as weights)
vmo = vmo_datadask_sel["vmo"].weighted(vmo_datadask_sel.time.dt.days_in_month).mean(dim="time")
vmo

Unnamed: 0,Array,Chunk
Bytes,41.20 MiB,1.37 MiB
Shape,"(50, 300, 360)","(50, 60, 60)"
Dask graph,30 chunks in 50 graph layers,30 chunks in 50 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 41.20 MiB 1.37 MiB Shape (50, 300, 360) (50, 60, 60) Dask graph 30 chunks in 50 graph layers Data type float64 numpy.ndarray",360  300  50,

Unnamed: 0,Array,Chunk
Bytes,41.20 MiB,1.37 MiB
Shape,"(50, 300, 360)","(50, 60, 60)"
Dask graph,30 chunks in 50 graph layers,30 chunks in 50 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,843.75 kiB,28.12 kiB
Shape,"(300, 360)","(60, 60)"
Dask graph,30 chunks in 2 graph layers,30 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 843.75 kiB 28.12 kiB Shape (300, 360) (60, 60) Dask graph 30 chunks in 2 graph layers Data type float64 numpy.ndarray",360  300,

Unnamed: 0,Array,Chunk
Bytes,843.75 kiB,28.12 kiB
Shape,"(300, 360)","(60, 60)"
Dask graph,30 chunks in 2 graph layers,30 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,843.75 kiB,28.12 kiB
Shape,"(300, 360)","(60, 60)"
Dask graph,30 chunks in 2 graph layers,30 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 843.75 kiB 28.12 kiB Shape (300, 360) (60, 60) Dask graph 30 chunks in 2 graph layers Data type float64 numpy.ndarray",360  300,

Unnamed: 0,Array,Chunk
Bytes,843.75 kiB,28.12 kiB
Shape,"(300, 360)","(60, 60)"
Dask graph,30 chunks in 2 graph layers,30 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray


In [37]:
# Slice mlotst dataset for the time period
mlotst_datadask_sel = mlotst_datadask.sel(time=slice(start_time, end_time))
# Take the time mean of the yearly maximum of mlotst
mlotst_yearlymax = mlotst_datadask_sel.groupby("time.year").max(dim="time")
mlotst_yearlymax

Unnamed: 0,Array,Chunk
Bytes,843.75 kiB,28.12 kiB
Shape,"(300, 360)","(60, 60)"
Dask graph,30 chunks in 2 graph layers,30 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 843.75 kiB 28.12 kiB Shape (300, 360) (60, 60) Dask graph 30 chunks in 2 graph layers Data type float64 numpy.ndarray",360  300,

Unnamed: 0,Array,Chunk
Bytes,843.75 kiB,28.12 kiB
Shape,"(300, 360)","(60, 60)"
Dask graph,30 chunks in 2 graph layers,30 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,843.75 kiB,28.12 kiB
Shape,"(300, 360)","(60, 60)"
Dask graph,30 chunks in 2 graph layers,30 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 843.75 kiB 28.12 kiB Shape (300, 360) (60, 60) Dask graph 30 chunks in 2 graph layers Data type float64 numpy.ndarray",360  300,

Unnamed: 0,Array,Chunk
Bytes,843.75 kiB,28.12 kiB
Shape,"(300, 360)","(60, 60)"
Dask graph,30 chunks in 2 graph layers,30 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,160 B,160 B
Shape,"(10, 2)","(10, 2)"
Dask graph,1 chunks in 20 graph layers,1 chunks in 20 graph layers
Data type,datetime64[ns] numpy.ndarray,datetime64[ns] numpy.ndarray
"Array Chunk Bytes 160 B 160 B Shape (10, 2) (10, 2) Dask graph 1 chunks in 20 graph layers Data type datetime64[ns] numpy.ndarray",2  10,

Unnamed: 0,Array,Chunk
Bytes,160 B,160 B
Shape,"(10, 2)","(10, 2)"
Dask graph,1 chunks in 20 graph layers,1 chunks in 20 graph layers
Data type,datetime64[ns] numpy.ndarray,datetime64[ns] numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,4.12 MiB,140.62 kiB
Shape,"(10, 300, 360)","(10, 60, 60)"
Dask graph,30 chunks in 8 graph layers,30 chunks in 8 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 4.12 MiB 140.62 kiB Shape (10, 300, 360) (10, 60, 60) Dask graph 30 chunks in 8 graph layers Data type float32 numpy.ndarray",360  300  10,

Unnamed: 0,Array,Chunk
Bytes,4.12 MiB,140.62 kiB
Shape,"(10, 300, 360)","(10, 60, 60)"
Dask graph,30 chunks in 8 graph layers,30 chunks in 8 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,32.96 MiB,562.50 kiB
Shape,"(10, 300, 360, 4)","(10, 60, 60, 2)"
Dask graph,60 chunks in 3 graph layers,60 chunks in 3 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 32.96 MiB 562.50 kiB Shape (10, 300, 360, 4) (10, 60, 60, 2) Dask graph 60 chunks in 3 graph layers Data type float64 numpy.ndarray",10  1  4  360  300,

Unnamed: 0,Array,Chunk
Bytes,32.96 MiB,562.50 kiB
Shape,"(10, 300, 360, 4)","(10, 60, 60, 2)"
Dask graph,60 chunks in 3 graph layers,60 chunks in 3 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,32.96 MiB,562.50 kiB
Shape,"(10, 300, 360, 4)","(10, 60, 60, 2)"
Dask graph,60 chunks in 3 graph layers,60 chunks in 3 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 32.96 MiB 562.50 kiB Shape (10, 300, 360, 4) (10, 60, 60, 2) Dask graph 60 chunks in 3 graph layers Data type float64 numpy.ndarray",10  1  4  360  300,

Unnamed: 0,Array,Chunk
Bytes,32.96 MiB,562.50 kiB
Shape,"(10, 300, 360, 4)","(10, 60, 60, 2)"
Dask graph,60 chunks in 3 graph layers,60 chunks in 3 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray


In [38]:
mlotst = mlotst_yearlymax.mean(dim="year")
mlotst

Unnamed: 0,Array,Chunk
Bytes,843.75 kiB,28.12 kiB
Shape,"(300, 360)","(60, 60)"
Dask graph,30 chunks in 2 graph layers,30 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 843.75 kiB 28.12 kiB Shape (300, 360) (60, 60) Dask graph 30 chunks in 2 graph layers Data type float64 numpy.ndarray",360  300,

Unnamed: 0,Array,Chunk
Bytes,843.75 kiB,28.12 kiB
Shape,"(300, 360)","(60, 60)"
Dask graph,30 chunks in 2 graph layers,30 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,843.75 kiB,28.12 kiB
Shape,"(300, 360)","(60, 60)"
Dask graph,30 chunks in 2 graph layers,30 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 843.75 kiB 28.12 kiB Shape (300, 360) (60, 60) Dask graph 30 chunks in 2 graph layers Data type float64 numpy.ndarray",360  300,

Unnamed: 0,Array,Chunk
Bytes,843.75 kiB,28.12 kiB
Shape,"(300, 360)","(60, 60)"
Dask graph,30 chunks in 2 graph layers,30 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,421.88 kiB,14.06 kiB
Shape,"(300, 360)","(60, 60)"
Dask graph,30 chunks in 10 graph layers,30 chunks in 10 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 421.88 kiB 14.06 kiB Shape (300, 360) (60, 60) Dask graph 30 chunks in 10 graph layers Data type float32 numpy.ndarray",360  300,

Unnamed: 0,Array,Chunk
Bytes,421.88 kiB,14.06 kiB
Shape,"(300, 360)","(60, 60)"
Dask graph,30 chunks in 10 graph layers,30 chunks in 10 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,3.30 MiB,56.25 kiB
Shape,"(300, 360, 4)","(60, 60, 2)"
Dask graph,60 chunks in 5 graph layers,60 chunks in 5 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 3.30 MiB 56.25 kiB Shape (300, 360, 4) (60, 60, 2) Dask graph 60 chunks in 5 graph layers Data type float64 numpy.ndarray",4  360  300,

Unnamed: 0,Array,Chunk
Bytes,3.30 MiB,56.25 kiB
Shape,"(300, 360, 4)","(60, 60, 2)"
Dask graph,60 chunks in 5 graph layers,60 chunks in 5 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,3.30 MiB,56.25 kiB
Shape,"(300, 360, 4)","(60, 60, 2)"
Dask graph,60 chunks in 5 graph layers,60 chunks in 5 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 3.30 MiB 56.25 kiB Shape (300, 360, 4) (60, 60, 2) Dask graph 60 chunks in 5 graph layers Data type float64 numpy.ndarray",4  360  300,

Unnamed: 0,Array,Chunk
Bytes,3.30 MiB,56.25 kiB
Shape,"(300, 360, 4)","(60, 60, 2)"
Dask graph,60 chunks in 5 graph layers,60 chunks in 5 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray


In [39]:
# # Slice thkcello dataset for the time period
# thkcello_datadask_sel = thkcello_datadask.sel(time=slice(start_time, end_time))
# # Take the time average of the monthly evaporation (using month length as weights)
# thkcello = thkcello_datadask_sel["thkcello"].weighted(thkcello_datadask_sel.time.dt.days_in_month).mean(dim="time")

In [40]:
# Save to netcdfs (and compute!)
umo.to_netcdf(f'{outputdir}/umo.nc', compute=True)
vmo.to_netcdf(f'{outputdir}/vmo.nc', compute=True)
mlotst.to_netcdf(f'{outputdir}/mlotst.nc', compute=True)
# thkcello.to_netcdf(f'{outputdir}/thkcello.nc', compute=True)

In [41]:
client.close()