# Explore the Analysis Of Record for Calibration (AORC) version 1.1 data
###### Using Xarray, Dask and hvPlot to explore the AORC version 1.1 data. We read from a cloud-optimized Zarr dataset that is part of the NOAA Open Data Dissemination (NODD) program and we use a Dask cluster to parallelize the computation and reading of data chunks.


#### AORC variables available to use:
 - APCP_surface
 - DLWRF_surface
 - DSWRF_surface
 - PRES_surface
 - SPFH_2maboveground
 - TMP_2maboveground
 - UGRD_10maboveground
 - VGRD_10maboveground


***

##### Common imports and base URL for all test

In [1]:
import xarray as xr
import fsspec
import numpy as np
import s3fs
import zarr

In [2]:
base_url = f's3://noaa-nws-aorc-v1-1-1km'

## Load a single year
###### Change the value of the year variable to the specific year

In [3]:
year = '1979'

In [4]:
single_year_url = f'{base_url}/{year}.zarr/'

In [None]:
%%time
ds_single = xr.open_zarr(fsspec.get_mapper(single_year_url, anon=True), consolidated=True)

###### Update the variable (var) with the AORC variable.  A full list of available variables are listed at the top of the notebook

In [6]:
var='APCP_surface'

In [None]:
ds_single[var]

In [None]:
print(f'Variable size: {ds_single[var].nbytes/1e12:.1f} TB')

#### To create the zarr to a netCDF file, use the variable ds_single created from above to create the netCDF

#### Start a Dask cluster
###### This is not required but it speeds up computations.  Here we start a local cluster that uses the cores available on the computer running the notebook server.  There are many other ways to set up Dask clusters that can scale larger than this.
###### If you are running this on your local machine add this - dask.config.set(temporary_directory='/dask-worker-space') - under import dask

In [None]:
import dask
from dask.distributed import Client
client = Client()
client

In [10]:
filename=f'/wrds-data/test/{year}.zarr'

In [None]:
ds_single.to_netcdf(filename, 'w')

In [None]:
print('finished')

***

## Load multiple years

###### Change the value of the dataset_years variable to the range of years needed.  The first number is the starting year and second number is the ending year + 1

In [3]:
dataset_years = list(range(2018,2023))


###### Next, we need to create a list variable (fileset) that contains all of the sorted years.  This can be done via 2 methods
###### Mapping can be used


In [4]:
s3_out = s3fs.S3FileSystem(anon=True)
fileset = [s3fs.S3Map(
            root=f"s3://{base_url}/{dataset_year}.zarr", s3=s3_out, check=False
        ) for dataset_year in dataset_years]


In [None]:
fileset

In [5]:
%%time
ds_multi_year = xr.open_mfdataset(fileset, engine='zarr')

###### Update the variable (var) with the AORC variable.  A full list of available variables are listed at the top of the notebook

In [None]:
var='APCP_surface'

In [None]:
ds_multi_year[var]

In [None]:
print(f'Variable size: {ds_multi_year[var].nbytes/1e12:.1f} TB')