<span style='color:#009999'> <span style='font-family:serif'> <font size="15"> **Using Earthaccess for access data via Hyrax's DMR++**<span style='color:#0066cc'> 



<span style='color:#ff6666'><font size="5">**Requirements**
1. <font size="3"><span style='color:Black'> An active EDL account.

`earthaccess` has their own way to authenticate that makes use of your EDL logging information.

 <span style='color:#ff6666'><font size="5"> **OPeNDAP, DMR++ and VirtualiZarr**


<font size="3"><span style='color:Black'> This notebook makes use of [earthacess](https://earthaccess.readthedocs.io/en/latest/), [VirtualiZarr](https://virtualizarr.readthedocs.io/en/latest/) and [xarray](https://docs.xarray.dev/en/stable/) to access NASA's cloud files currently on `S3`. [earthacess](https://earthaccess.readthedocs.io/en/latest/) has `built-in` support for accessing OPeNDAP in the Cloud's `DMR++` metadata directly, as opposed to OPeNDAP's Hyrax data server. DMR++ is then to Zarr metadata via [VirtualiZarr](https://virtualizarr.readthedocs.io/en/latest/), providing a huge performance boost for running both locally, or in a Cloud compute environment.

<span style='color:#0066cc'><font size="3.5"> **open_virtual_dataset**: 

[earthacess](https://earthaccess.readthedocs.io/en/latest/) allows data users to convert Hyrax's in the Cloud DMR++ metadata into cloud optimized reference files for the data stored in the cloud. THis is done via:

- `earthaccess.open_virtual_dataset`
- `earthaccess.open_virtual_mfdataset`


<span style='color:#0066cc'><font size="3.5"> **access="indirect" vs access="direct"**: 


This tutorial loads data over `https` (`access="indirect"`). However, there is a **significant speed improvement** when using these functions in-cloud and enabling `access="direct"`. This is the case when running this notebook over managed cloud JupyterHubs like [NASA VEDA](https://www.earthdata.nasa.gov/dashboard/) or [2i2c Openscapes](https://workshop.openscapes.2i2c.cloud/hub/login?next=%2Fhub%2F). This is because the data is streamed directly from cloud storage to cloud compute.

 <span style='color:#ff6666'><font size="5"> **Objectives**
 
 
- <font size="3"><span style='color:Black'> Demonstrate how to use [earthacess](https://earthaccess.readthedocs.io/en/latest/) to query datasets that are aviable via `OPeNDAP` in the Cloud.
- <font size="3"><span style='color:Black'> Demonstrate the use of [earthacess](https://earthaccess.readthedocs.io/en/latest/) to create a virtually aggregated xarray data cube, making use of the Zarr metadata created from DMR++.
- <font size="3"><span style='color:Black'> Demonstrate an advanced workflow for storing virtual reference as a Kerchunk object, for later use.


<span style='color:#0066cc'><font size="3.5"> **WARNING**: 

This feature is current experimental and may change in the future. This feature relies on `NASA` / `OPeNDAP` **DMR++** metadata files which may not always be present for your dataset and you may get a `FileNotFoundError`.




<span style='color:#0066cc'><font size="3.5"> **Additional References**: 


* [Cloud optimized access to NASA data with earthaccess and virtualizarr](https://earthaccess.readthedocs.io/en/latest/tutorials/dmrpp-virtualizarr/).

* [Nag, Ayush, Gallagher, James. (August, 2024). VirtualiZarr and DMR++. Zenodo. https://doi.org/10.5281/zenodo.13176038](https://doi.org/10.5281/zenodo.13176038).

* [Gallagher, James, Yang, Kent, Lee, Hyokyung. (November, 2024). High-Performance Access to Archival Data Stored in HDF4 and HDF5 on Cloud Object Stores Without Reformatting the Files. Zenodo. https://doi.org/10.5281/zenodo.14232491](https://doi.org/10.5281/zenodo.14232491).

In [None]:
import earthaccess
import xarray as xr

In [None]:
print("`earthaccess` version: ", earthaccess.__version__)

### NASA JPL Multiscale Ultrahigh Resolution (MUR) Sea Surface Temperature (SST) dataset - 0.01 degree resolution

We now search for NASA JPL MUR SST data. For that we need
- temporal range
- Short Name of collection

In [None]:
results = earthaccess.search_data(
    temporal=("2010-01-01", "2010-01-31"), short_name="MUR-JPL-L4-GLOB-v4.1"
)
len(results)

### access DMR++ and create a virtual xarray object
we set:
- `access="indirect"`: Running this notebook on binder or local machine.
- `access="direct"`. Use this when runnnig this notebook on an EC2 instance to make the best use of DMR++, xarray, and DASK.





In [None]:
%%time
mur = earthaccess.open_virtual_mfdataset(
    results,
    access="indirect",
    load=True, # This means Dimensions are loaded into memory
    concat_dim="time",
    coords="all",
    compat="override",
    combine_attrs="drop_conflicts",
)
mur

In [None]:
print("This created a virtual reference pointing to ", mur.nbytes/1e9, "GBs of data on the cloud!")

## We now plot some data

This will actually trigger download / computation of the selected dataset

NOTE:

* The dimensions are loaded into memory. We can manipulate them 
* Dimensions are coordinates (not always the case). So we can subset by spatial lat/lon values!!
* We can also subset by time (time is a dimension)


In [None]:
%%time
spatial_subset = mur.isel(time=0).sel(lat=slice(20, 45), lon=slice(-95, -50))
spatial_subset

In [None]:
%%time
spatial_subset["analysed_sst"].plot.pcolormesh(x="lon", y="lat", cmap="RdBu_r", figsize=(8, 4));