# yt_xarray

linking yt & xarray

* code: https://github.com/data-exp-lab/yt_xarray/
* docs: https://yt-xarray.readthedocs.io

this presentation: https://github.com/chrishavlin/yt_xarray_walkthrough_dxl 

built with: https://github.com/deathbeds/jupyterlab-deck

## xarray

Multidimensional array IO:

* self-describing data formats (netcdf, ...)

* metadata preservation, arbitrary dimension names

* distributed support (chunks to files): 
    * dask arrays 
    * zarr arrays

Load in a [GEOS](https://gmao.gsfc.nasa.gov/GEOS_systems/) dataset (~2 GB, NASA Global Modeling and Assimilation Office):

In [60]:
import xarray as xr 
import os 

fname_geos = os.path.expanduser("~/hdd/data/yt_data/yt_sample_sets/geos/GEOS.fp.asm.inst3_3d_aer_Nv.20180822_0900.V01.nc4")
ds = xr.open_dataset(fname_geos)
ds

data variable access:

In [6]:
ds.data_vars['AIRDENS']

extract dimension and coordinate info:

In [10]:
ds.AIRDENS.coords['lon']

## Data selection with xarray 

### np-style array access and slicing

In [14]:
ds.AIRDENS.dims

('time', 'lev', 'lat', 'lon')

extracting raw np arrays:

In [15]:
ds.AIRDENS[0,0,:,:].to_numpy() 

array([[2.5341093e-05, 2.5341093e-05, 2.5341093e-05, ..., 2.5341093e-05,
        2.5341093e-05, 2.5341093e-05],
       [2.5462165e-05, 2.5462165e-05, 2.5460302e-05, ..., 2.5462165e-05,
        2.5462165e-05, 2.5462165e-05],
       [2.5633528e-05, 2.5633528e-05, 2.5633528e-05, ..., 2.5633528e-05,
        2.5633528e-05, 2.5633528e-05],
       ...,
       [2.8719931e-05, 2.8719931e-05, 2.8719931e-05, ..., 2.8721794e-05,
        2.8721794e-05, 2.8721794e-05],
       [2.8764634e-05, 2.8764634e-05, 2.8764634e-05, ..., 2.8764634e-05,
        2.8764634e-05, 2.8764634e-05],
       [2.8788849e-05, 2.8788849e-05, 2.8788849e-05, ..., 2.8788849e-05,
        2.8788849e-05, 2.8788849e-05]], dtype=float32)

need to remember axis ordering!

### selection by coordinate **name**

by index (`isel`):

In [16]:
ds.AIRDENS.isel(lev=1, time=0, lat=2)

by **exact** value (`sel`):

In [61]:
ds.AIRDENS.sel(lev=2.1, lat=-89.5)

KeyError: "not all values found in index 'lev'. Try setting the `method` keyword argument (example: method='nearest')."

with some fuzziness: 

In [62]:
ds.AIRDENS.sel(lev=2.1, method="nearest")

finally, with dictionary:

In [19]:
ds.AIRDENS.sel({"lev":2.0, "lat":-89.0})  # important for yt_xarray!

## xarray & dask 

In [20]:
ds.close()
del ds

Start dask client

In [21]:
from dask.distributed import Client
c = Client(n_workers=os.cpu_count()-2, threads_per_worker=1)

Test data set ([generated from here](https://github.com/chrishavlin/yt-xarray-dask-sandbox/blob/main/example.ipynb)):
* random field data 
* 1000 chunks
* 1 chunk = 1 .nc file

In [22]:
data_dir = os.path.expanduser("~/hdd/data/yt_data/yt_sample_sets/yt_xarray_test_data/dask_mf/data")
dask_test_ds = os.path.join(data_dir, "*.nc")
ds = xr.open_mfdataset(dask_test_ds)
ds

Unnamed: 0,Array,Chunk
Bytes,0.99 GiB,1.01 MiB
Shape,"(510, 510, 510)","(51, 51, 51)"
Dask graph,1000 chunks in 2111 graph layers,1000 chunks in 2111 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 0.99 GiB 1.01 MiB Shape (510, 510, 510) (51, 51, 51) Dask graph 1000 chunks in 2111 graph layers Data type float64 numpy.ndarray",510  510  510,

Unnamed: 0,Array,Chunk
Bytes,0.99 GiB,1.01 MiB
Shape,"(510, 510, 510)","(51, 51, 51)"
Dask graph,1000 chunks in 2111 graph layers,1000 chunks in 2111 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,0.99 GiB,1.01 MiB
Shape,"(510, 510, 510)","(51, 51, 51)"
Dask graph,1000 chunks in 2111 graph layers,1000 chunks in 2111 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 0.99 GiB 1.01 MiB Shape (510, 510, 510) (51, 51, 51) Dask graph 1000 chunks in 2111 graph layers Data type float64 numpy.ndarray",510  510  510,

Unnamed: 0,Array,Chunk
Bytes,0.99 GiB,1.01 MiB
Shape,"(510, 510, 510)","(51, 51, 51)"
Dask graph,1000 chunks in 2111 graph layers,1000 chunks in 2111 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,0.99 GiB,1.01 MiB
Shape,"(510, 510, 510)","(51, 51, 51)"
Dask graph,1000 chunks in 2111 graph layers,1000 chunks in 2111 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 0.99 GiB 1.01 MiB Shape (510, 510, 510) (51, 51, 51) Dask graph 1000 chunks in 2111 graph layers Data type float64 numpy.ndarray",510  510  510,

Unnamed: 0,Array,Chunk
Bytes,0.99 GiB,1.01 MiB
Shape,"(510, 510, 510)","(51, 51, 51)"
Dask graph,1000 chunks in 2111 graph layers,1000 chunks in 2111 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,0.99 GiB,1.01 MiB
Shape,"(510, 510, 510)","(51, 51, 51)"
Dask graph,1000 chunks in 2111 graph layers,1000 chunks in 2111 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 0.99 GiB 1.01 MiB Shape (510, 510, 510) (51, 51, 51) Dask graph 1000 chunks in 2111 graph layers Data type float64 numpy.ndarray",510  510  510,

Unnamed: 0,Array,Chunk
Bytes,0.99 GiB,1.01 MiB
Shape,"(510, 510, 510)","(51, 51, 51)"
Dask graph,1000 chunks in 2111 graph layers,1000 chunks in 2111 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,0.99 GiB,1.01 MiB
Shape,"(510, 510, 510)","(51, 51, 51)"
Dask graph,1000 chunks in 2111 graph layers,1000 chunks in 2111 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 0.99 GiB 1.01 MiB Shape (510, 510, 510) (51, 51, 51) Dask graph 1000 chunks in 2111 graph layers Data type float64 numpy.ndarray",510  510  510,

Unnamed: 0,Array,Chunk
Bytes,0.99 GiB,1.01 MiB
Shape,"(510, 510, 510)","(51, 51, 51)"
Dask graph,1000 chunks in 2111 graph layers,1000 chunks in 2111 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray


In [23]:
ds.temperature

Unnamed: 0,Array,Chunk
Bytes,0.99 GiB,1.01 MiB
Shape,"(510, 510, 510)","(51, 51, 51)"
Dask graph,1000 chunks in 2111 graph layers,1000 chunks in 2111 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 0.99 GiB 1.01 MiB Shape (510, 510, 510) (51, 51, 51) Dask graph 1000 chunks in 2111 graph layers Data type float64 numpy.ndarray",510  510  510,

Unnamed: 0,Array,Chunk
Bytes,0.99 GiB,1.01 MiB
Shape,"(510, 510, 510)","(51, 51, 51)"
Dask graph,1000 chunks in 2111 graph layers,1000 chunks in 2111 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray


* **Coordinates** are in memory and over all chunks!
* **Data variables** are dask arrays

**Returning in-memory values:**

In [24]:
ds.temperature.mean()

Unnamed: 0,Array,Chunk
Bytes,8 B,8 B
Shape,(),()
Dask graph,1 chunks in 2116 graph layers,1 chunks in 2116 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
Array Chunk Bytes 8 B 8 B Shape () () Dask graph 1 chunks in 2116 graph layers Data type float64 numpy.ndarray,,

Unnamed: 0,Array,Chunk
Bytes,8 B,8 B
Shape,(),()
Dask graph,1 chunks in 2116 graph layers,1 chunks in 2116 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray


In [25]:
ds.temperature.mean().values  # equivalent to .compute()

array(10.00039071)

In [27]:
ds.temperature.mean().load()  # to preserve xarray-ness

**selections are also delayed:**

In [28]:
vals = ds.temperature.isel(z=range(10)).sel(x=1, y=2, method="nearest")
vals

Unnamed: 0,Array,Chunk
Bytes,80 B,80 B
Shape,"(10,)","(10,)"
Dask graph,1 chunks in 2113 graph layers,1 chunks in 2113 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 80 B 80 B Shape (10,) (10,) Dask graph 1 chunks in 2113 graph layers Data type float64 numpy.ndarray",10  1,

Unnamed: 0,Array,Chunk
Bytes,80 B,80 B
Shape,"(10,)","(10,)"
Dask graph,1 chunks in 2113 graph layers,1 chunks in 2113 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray


In [29]:
vals.load()

## what about yt?



previously:

1. load in arrays in memory via xarray 
2. use yt generic data loader (`yt.load_uniform_grid(...)`)


**yt_xarray** v0.1.1: yt datasets from xarray datasets

automate (as much as possible) 1 & 2 !

## **yt_xarray** usage overview

yt_xarray provides a `yt` "accessor object":

In [30]:
import yt_xarray

In [31]:
ds.yt

<yt_xarray.accessor.accessor.YtAccessor at 0x7fb640492800>

In [33]:
ds.yt.load_grid?

[0;31mSignature:[0m
[0mds[0m[0;34m.[0m[0myt[0m[0;34m.[0m[0mload_grid[0m[0;34m([0m[0;34m[0m
[0;34m[0m    [0mfields[0m[0;34m:[0m [0mOptional[0m[0;34m[[0m[0mList[0m[0;34m[[0m[0mstr[0m[0;34m][0m[0;34m][0m [0;34m=[0m [0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mgeometry[0m[0;34m:[0m [0mstr[0m [0;34m=[0m [0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0muse_callable[0m[0;34m:[0m [0mbool[0m [0;34m=[0m [0;32mTrue[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0msel_dict[0m[0;34m:[0m [0mOptional[0m[0;34m[[0m[0mdict[0m[0;34m][0m [0;34m=[0m [0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0msel_dict_type[0m[0;34m:[0m [0mOptional[0m[0;34m[[0m[0mstr[0m[0;34m][0m [0;34m=[0m [0;34m'isel'[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mchunksizes[0m[0;34m:[0m [0mOptional[0m[0;34m[[0m[0mint[0m[0;34m][0m [0;34m=[0m [0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0;34m**[0m[0mkwar

`ds.yt.load_grid`: yt `ds` from xr subset

### Loading all data (not always possible):

In [34]:
ds_yt = ds.yt.load_grid(length_unit="km")

yt_xarray : [INFO ] 2023-02-22 13:17:23,779:  Attempting to detect if yt_xarray will require field interpolation:
yt_xarray : [INFO ] 2023-02-22 13:17:23,780:      Cartesian geometry on uniform grid: yt_xarray will not interpolate.
yt : [INFO     ] 2023-02-22 13:17:23,851 Parameters: current_time              = 0.0
yt : [INFO     ] 2023-02-22 13:17:23,853 Parameters: domain_dimensions         = [510 510 510]
yt : [INFO     ] 2023-02-22 13:17:23,854 Parameters: domain_left_edge          = [0. 0. 0.]
yt : [INFO     ] 2023-02-22 13:17:23,856 Parameters: domain_right_edge         = [10. 10. 10.]
yt : [INFO     ] 2023-02-22 13:17:23,857 Parameters: cosmological_simulation   = 0


In [35]:
ds_yt.field_list

[('stream', 'gauss'),
 ('stream', 'temperature'),
 ('stream', 'xvals'),
 ('stream', 'yvals'),
 ('stream', 'zvals')]

In [36]:
import yt
slc = yt.SlicePlot(ds_yt, "z", ("stream", "gauss"), center = ds_yt.arr([3., 3., 3.], 'code_length'))
slc.set_log(("stream", "gauss"), False)
slc.show()

yt : [INFO     ] 2023-02-22 13:17:48,088 xlim = -2.000000 8.000000
yt : [INFO     ] 2023-02-22 13:17:48,089 ylim = -2.000000 8.000000
yt : [INFO     ] 2023-02-22 13:17:48,090 xlim = -2.000000 8.000000
yt : [INFO     ] 2023-02-22 13:17:48,090 ylim = -2.000000 8.000000
yt : [INFO     ] 2023-02-22 13:17:48,098 Making a fixed resolution buffer of (('stream', 'gauss')) 800 by 800


### not always so easy...

[**wrf**](https://www.mmm.ucar.edu/models/wrf): "weather research and forecasting" model

cf (Climate and Forecast) compliance of netcdf files: https://cfconventions.org/



wrf is **not** cf compliant...

In [37]:
ds = yt_xarray.open_dataset('wrf/wrfout_d03_2016-06-01.nc')  # checks yt paths
ds

In [38]:
import xwrf  

In [40]:
ds_x = ds.xwrf.postprocess()  # make it cf-compliant-ish
ds_x

Hits all of the yt_xarray challenges:

1. different dimensionality of fields (including time)
2. yt has strict coordinate names (latitude, longitude, altitude), (x, y, z), (r, theta, phi), etc.

### choose a subset of fields

In [42]:

ds_x.yt.load_grid(
    fields=('geopotential', 'geopotential_height')
)

NotImplementedError: Loading data with time as a dimension is not currently supported. Please provide a selection dictionary to select a single time to load.

### choose a time to load

In [43]:
ds_yt = ds_x.yt.load_grid(
    fields=('geopotential', 'geopotential_height'),                      
    sel_dict={'Time':0})

ValueError: z_stag is not a known coordinate. To load in yt, you must supply an alias via the yt_xarray.known_coord_aliases dictionary.

### COORDINATE ALIASING

In [46]:
yt_xarray.known_coord_aliases

{'altitude': 'altitude',
 'height': 'altitude',
 'level': 'altitude',
 'lev': 'altitude',
 'latitude': 'latitude',
 'lat': 'latitude',
 'longitude': 'longitude',
 'lon': 'longitude',
 'z_stag': 'z'}

In [45]:
yt_xarray.known_coord_aliases["z_stag"] = "z"

In [47]:
ds_yt = ds_x.yt.load_grid(fields=('geopotential', 'geopotential_height'),
                          sel_dict={'Time':0},
                          length_unit='m',
                          use_callable=False)

yt_xarray : [INFO ] 2023-02-22 13:23:54,086:  Inferred geometry type is cartesian. To override, use ds.yt.set_geometry
yt_xarray : [INFO ] 2023-02-22 13:23:54,088:  Attempting to detect if yt_xarray will require field interpolation:
yt_xarray : [INFO ] 2023-02-22 13:23:54,089:      stretched grid detected: yt_xarray will interpolate.
yt : [INFO     ] 2023-02-22 13:23:54,384 Parameters: current_time              = 1.4647392e+18
yt : [INFO     ] 2023-02-22 13:23:54,386 Parameters: domain_dimensions         = [ 39 251 251]
yt : [INFO     ] 2023-02-22 13:23:54,387 Parameters: domain_left_edge          = [      0.         -101500.36132865 -134499.3274438 ]
yt : [INFO     ] 2023-02-22 13:23:54,388 Parameters: domain_right_edge         = [1.00000000e+00 1.49499639e+05 1.16500673e+05]
yt : [INFO     ] 2023-02-22 13:23:54,389 Parameters: cosmological_simulation   = 0


separate problem with the 3d data (bug: interpolation going wrong)... so:

In [48]:
ds_yt = ds_x.yt.load_grid(fields=('geopotential', 'geopotential_height'),
                          sel_dict={'Time':0, 'z_stag':4},
                          length_unit='m')   

yt_xarray : [INFO ] 2023-02-22 13:24:21,439:  Attempting to detect if yt_xarray will require field interpolation:
yt_xarray : [INFO ] 2023-02-22 13:24:21,440:      Cartesian geometry on uniform grid: yt_xarray will not interpolate.
yt : [INFO     ] 2023-02-22 13:24:21,487 Parameters: current_time              = 1.4647392e+18
yt : [INFO     ] 2023-02-22 13:24:21,487 Parameters: domain_dimensions         = [252 252   1]
yt : [INFO     ] 2023-02-22 13:24:21,488 Parameters: domain_left_edge          = [-1.02000361e+05 -1.34999327e+05 -5.00000000e-01]
yt : [INFO     ] 2023-02-22 13:24:21,489 Parameters: domain_right_edge         = [1.49999639e+05 1.17000673e+05 5.00000000e-01]
yt : [INFO     ] 2023-02-22 13:24:21,490 Parameters: cosmological_simulation   = 0


finally ... 

In [50]:
slc = yt.SlicePlot(ds_yt, "z", ("stream", "geopotential_height"))
slc.set_log("all", False)

yt : [INFO     ] 2023-02-22 13:25:13,247 xlim = -134999.327444 117000.672556
yt : [INFO     ] 2023-02-22 13:25:13,248 ylim = -102000.361329 149999.638671
yt : [INFO     ] 2023-02-22 13:25:13,251 xlim = -134999.327444 117000.672556
yt : [INFO     ] 2023-02-22 13:25:13,252 ylim = -102000.361329 149999.638671
yt : [INFO     ] 2023-02-22 13:25:13,254 Making a fixed resolution buffer of (('stream', 'geopotential_height')) 800 by 800


**Note**: need to use yt coordinate names for yt functions

**What is [geopotential height](https://legacy.climate.ncsu.edu/images/climate/enso/geo_heights.php)?**: 

* cold air denser than warm air 
* pressure in the atmo from overlying air

geopotential height = the altitude to get to a particular pressure


### yt_xarray chunking

load back our dask-xarray ds... 

In [51]:
ds = xr.open_mfdataset(dask_test_ds)
ds

Unnamed: 0,Array,Chunk
Bytes,0.99 GiB,1.01 MiB
Shape,"(510, 510, 510)","(51, 51, 51)"
Dask graph,1000 chunks in 2111 graph layers,1000 chunks in 2111 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 0.99 GiB 1.01 MiB Shape (510, 510, 510) (51, 51, 51) Dask graph 1000 chunks in 2111 graph layers Data type float64 numpy.ndarray",510  510  510,

Unnamed: 0,Array,Chunk
Bytes,0.99 GiB,1.01 MiB
Shape,"(510, 510, 510)","(51, 51, 51)"
Dask graph,1000 chunks in 2111 graph layers,1000 chunks in 2111 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,0.99 GiB,1.01 MiB
Shape,"(510, 510, 510)","(51, 51, 51)"
Dask graph,1000 chunks in 2111 graph layers,1000 chunks in 2111 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 0.99 GiB 1.01 MiB Shape (510, 510, 510) (51, 51, 51) Dask graph 1000 chunks in 2111 graph layers Data type float64 numpy.ndarray",510  510  510,

Unnamed: 0,Array,Chunk
Bytes,0.99 GiB,1.01 MiB
Shape,"(510, 510, 510)","(51, 51, 51)"
Dask graph,1000 chunks in 2111 graph layers,1000 chunks in 2111 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,0.99 GiB,1.01 MiB
Shape,"(510, 510, 510)","(51, 51, 51)"
Dask graph,1000 chunks in 2111 graph layers,1000 chunks in 2111 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 0.99 GiB 1.01 MiB Shape (510, 510, 510) (51, 51, 51) Dask graph 1000 chunks in 2111 graph layers Data type float64 numpy.ndarray",510  510  510,

Unnamed: 0,Array,Chunk
Bytes,0.99 GiB,1.01 MiB
Shape,"(510, 510, 510)","(51, 51, 51)"
Dask graph,1000 chunks in 2111 graph layers,1000 chunks in 2111 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,0.99 GiB,1.01 MiB
Shape,"(510, 510, 510)","(51, 51, 51)"
Dask graph,1000 chunks in 2111 graph layers,1000 chunks in 2111 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 0.99 GiB 1.01 MiB Shape (510, 510, 510) (51, 51, 51) Dask graph 1000 chunks in 2111 graph layers Data type float64 numpy.ndarray",510  510  510,

Unnamed: 0,Array,Chunk
Bytes,0.99 GiB,1.01 MiB
Shape,"(510, 510, 510)","(51, 51, 51)"
Dask graph,1000 chunks in 2111 graph layers,1000 chunks in 2111 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,0.99 GiB,1.01 MiB
Shape,"(510, 510, 510)","(51, 51, 51)"
Dask graph,1000 chunks in 2111 graph layers,1000 chunks in 2111 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 0.99 GiB 1.01 MiB Shape (510, 510, 510) (51, 51, 51) Dask graph 1000 chunks in 2111 graph layers Data type float64 numpy.ndarray",510  510  510,

Unnamed: 0,Array,Chunk
Bytes,0.99 GiB,1.01 MiB
Shape,"(510, 510, 510)","(51, 51, 51)"
Dask graph,1000 chunks in 2111 graph layers,1000 chunks in 2111 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray


specify the chunksizes to use for the **yt** grids:

In [52]:
yt_ds = ds.yt.load_grid(fields=("gauss",), length_unit='m', chunksizes=51)

yt_xarray : [INFO ] 2023-02-22 13:26:04,981:  Inferred geometry type is cartesian. To override, use ds.yt.set_geometry
yt_xarray : [INFO ] 2023-02-22 13:26:04,982:  Attempting to detect if yt_xarray will require field interpolation:
yt_xarray : [INFO ] 2023-02-22 13:26:04,983:      Cartesian geometry on uniform grid: yt_xarray will not interpolate.
yt_xarray : [INFO ] 2023-02-22 13:26:04,984:  Constructing a yt chunked grid with 1000 chunks.
yt : [INFO     ] 2023-02-22 13:26:05,050 Parameters: current_time              = 0.0
yt : [INFO     ] 2023-02-22 13:26:05,051 Parameters: domain_dimensions         = [510 510 510]
yt : [INFO     ] 2023-02-22 13:26:05,052 Parameters: domain_left_edge          = [0. 0. 0.]
yt : [INFO     ] 2023-02-22 13:26:05,053 Parameters: domain_right_edge         = [10. 10. 10.]
yt : [INFO     ] 2023-02-22 13:26:05,054 Parameters: cosmological_simulation   = 0


In [54]:
len(yt_ds.index.grids)

1000

In [56]:
import yt
slc = yt.SlicePlot(yt_ds, "z", ("stream", "gauss"), center = ds_yt.arr([3., 3., 3.], 'code_length'))
slc.annotate_grids()
slc.show()

yt : [INFO     ] 2023-02-22 13:27:52,217 xlim = -2.000000 8.000000
yt : [INFO     ] 2023-02-22 13:27:52,218 ylim = -2.000000 8.000000
yt : [INFO     ] 2023-02-22 13:27:52,218 xlim = -2.000000 8.000000
yt : [INFO     ] 2023-02-22 13:27:52,219 ylim = -2.000000 8.000000
yt : [INFO     ] 2023-02-22 13:27:52,221 Making a fixed resolution buffer of (('stream', 'gauss')) 800 by 800
  out_arr = func(
  mapped = np.dstack([(np.interp(buff, x, v) * 255).astype("uint8") for v in lut])


**each yt grid = dask chunk = on disk .nc file**

but chunk aligment not gauranteed... 

In [57]:
import yt_xarray
yt_ds = ds.yt.load_grid(fields=("gauss",), length_unit='m', chunksizes=102)
slc = yt.SlicePlot(yt_ds, "z", ("stream", "gauss"), center = ds_yt.arr([3., 3., 3.], 'code_length'))
slc.annotate_grids()
slc.show()

yt_xarray : [INFO ] 2023-02-22 13:28:02,752:  Attempting to detect if yt_xarray will require field interpolation:
yt_xarray : [INFO ] 2023-02-22 13:28:02,753:      Cartesian geometry on uniform grid: yt_xarray will not interpolate.
yt_xarray : [INFO ] 2023-02-22 13:28:02,754:  Constructing a yt chunked grid with 125 chunks.
yt : [INFO     ] 2023-02-22 13:28:02,796 Parameters: current_time              = 0.0
yt : [INFO     ] 2023-02-22 13:28:02,797 Parameters: domain_dimensions         = [510 510 510]
yt : [INFO     ] 2023-02-22 13:28:02,798 Parameters: domain_left_edge          = [0. 0. 0.]
yt : [INFO     ] 2023-02-22 13:28:02,799 Parameters: domain_right_edge         = [10. 10. 10.]
yt : [INFO     ] 2023-02-22 13:28:02,801 Parameters: cosmological_simulation   = 0
yt : [INFO     ] 2023-02-22 13:28:04,943 xlim = -2.000000 8.000000
yt : [INFO     ] 2023-02-22 13:28:04,943 ylim = -2.000000 8.000000
yt : [INFO     ] 2023-02-22 13:28:04,944 xlim = -2.000000 8.000000
yt : [INFO     ] 2023-0

possible feature? 

but working to auto-align... 

([ChunkWalker prototype](https://github.com/chrishavlin/yt-xarray-dask-sandbox/blob/main/daxryt/chunk_inspector.py)), `._recursive_chonker` walks dask-xr chunks:


In [59]:
ds.gauss.chunksizes  # dask-related chunk attributes

Frozen({'x': (51, 51, 51, 51, 51, 51, 51, 51, 51, 51), 'y': (51, 51, 51, 51, 51, 51, 51, 51, 51, 51), 'z': (51, 51, 51, 51, 51, 51, 51, 51, 51, 51)})

only have the chunk sizes in each dimension, yt needs the physical left, right edges, potentially cell widths for each chunk.

# possible yt_xarray future

performance-focused:
* auto-alignement of dask and yt chunks
* dedicated frontend instead of stream loaders? 
* improve support for stretched grids: chunking (yt), delayed reads (yt_xarray)

enhancing user experience:
* allow arbitrary coordinate names (yt)
* wrap yt functions?

```python
ds.yt.SlicePlot(normal_axis, fields, ....)
```



# yt_xarray implementation details

loads data as yt Stream frontend via `load_amr_grids`:


```python
yt.load_amr_grids(
        grid_data,  # the data OR FUNCTION for the grid(s)
        data_shp,   # global grid shape, (Nx, Ny, Nz)
        geometry=geom,  # e.g., ('cartesian', ('x', 'z', 'y'))
        bbox=bbox,  # the bounding box
        length_unit=length_unit,  
        **kwargs,
    )
```    

Form of `grid_data` depends on:

* grid type (uniform, stretched)
* memory management: delayed reads (`use_callable`) vs in memory
* chunking

## general outline

```python
ds_xr.yt.load_grid(fields, 
                   sel_dict={'time':0, ...},
                   chunksizes=64, 
                   use_callable=True/False)
```


1. check that fields have valid dims

2. apply the selection dictionary to the field **COORDINATES**, record yt grid details

3. if `use_callable`, initialize the reader function, otherwise read the data fields

4. if chunksizes > 1, split the array (again by **COORDINATES**) and use `load_amr_grids`. otherwise, use `load_uniform_grid`

5. stretched grid corrections...

## code tour (if time)

We'll look at:

* `yt_xarray.accessor.accessor.YtAccessor` : the top level accessor object
* `yt_xarray.accessor._xr_to_yt.Selection` : yt-xr translation, mapping of selections
* `yt_xarray.accessor._readers._get_xarray_reader`: building a function to load the data when needed
