# Using OPeNDAP to Access Data from the Earthdata Cloud Archives

### Overview

This notebook demonstrates how to access OPeNDAP granules hosted inside of the Earthdata Cloud Archives. It shows how to query a cloud OPeNDAP-hosted Daymet granule using the `earthaccess` library, before remotely accessing and analyzing it using Xarray and ncdump.

### Review: What is OPeNDAP?

OPeNDAP, or the [Open-source Project for a Network Data Access Protocol](https://www.earthdata.nasa.gov/engage/open-data-services-and-software/api/opendap), is a data server that allows for accessing scientific datasets through the public internet. It uses Data Access Protocols (DAP) and the [Hyrax Data Server](https://www.opendap.org/software/hyrax-data-server), to distribute data and metadata to various clients and utilities, including Python. NASA and its [Distributed Active Archive Centers (DAACs)](https://www.earthdata.nasa.gov/eosdis/daacs) are migrating their on-premise OPeNDAP Hyrax servers to the cloud, where granules are now organized by DAAC and collection [Concept ID](https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html#c-concept-id). This notebook will show how to search for [Daymet](https://daymet.ornl.gov/) Cloud OPeNDAP granules by DOI, before viewing and plotting its data through several methods, including [Xarray](https://docs.xarray.dev/en/stable/) and [ncdump](https://www.unidata.ucar.edu/software/netcdf/workshops/2011/utilities/Ncdump.html).

### Prerequisites

- A valid [Earthdata Login account](https://urs.earthdata.nasa.gov/)
    - Generation of the `.netrc` and `.dodsrc` files (both files will be generated in this notebook)
- Python 3.10 or higher
- [Xarray](https://docs.xarray.dev/en/stable/)
- [earthaccess](https://earthaccess.readthedocs.io/en/latest/)
- [pydap](https://pydap.github.io/pydap/en/intro.html) 
- netcdf-c version == 4.9.0, or >=4.9.3
    - To check the version of your library, call the following function after importing `netCDF4-python`: <code>netCDF4.getlibversion()</code>

***Optional Anaconda Environment YAML***:
This notebook can be run using the ['nasa-gesdisc' YAML file](https://github.com/nasa/gesdisc-tutorials/tree/main/environments/nasa-gesdisc.yml) provided in the 'environments' subfolder. 
Please follow the instructions [here](https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#creating-an-environment-from-an-environment-yml-file) to install and activate this environment.


**If you are running this notebook in the 2i2c Jupyterhub,** you will need to downgrade the NumPy and netCDF4-python libraries for DAP4 compatibility. Please uncomment the next cell and execute before running the following cells:

In [None]:
# Uncomment the following commands and run only if you are using the 2i2c Jupyterhub to run this notebook:
'''
! pip install -U netcdf4==1.6.2
! pip install -U numpy==1.25.2

# Automatically restart the kernel after package installation
import IPython
app = IPython.get_ipython()
app.kernel.do_shutdown(restart=True)
'''

---

## 1. Import Packages

In [None]:
import xarray as xr
import earthaccess
from pydap.net import create_session
%matplotlib inline

## 2. Create EDL files using the <code>earthaccess</code> Python library

First, pass your Earthdata credentials to the `earthaccess` library to create the `.netrc` and `.dodsrc` files:

In [None]:
auth = earthaccess.login(strategy="interactive", persist=True) 

## 3. Searching for Daymet Cloud OPeNDAP Granules using the CMR and `earthaccess`
Daymet daily data files (or granules) are in netCDF4 format, and each file has one year's worth of data. Data files are organized by variables (each for dayl, prcp, tmin, tmax, srad, swe, vp) and regions (each for us, pr, hi).  Daymet filenames can be used to identify the files from continental North America (`*_na_*.nc`). The files from Puerto Rico and Hawaii are named as (`*_pr_*.nc`) and (`*_hi_*.nc`) respectively.

We will first search all the granules for the full extent, and time period of interest (2010, 2011), using the `earthaccess` library. The `earthaccess` library will pass our search parameters to the CMR and return a JSON of metadata.  For this tutorial, we set up the search parameters and download maximum temperature data (`tmax`).

In [None]:
# Query CMR for OPeNDAP links
results = earthaccess.search_data(
    doi="10.3334/ORNLDAAC/2129",
    temporal=('2010-01-01', '2011-12-31'),
)

## 4. Open and Subset Granules Using Xarray and the DAP4 Protocol

The "[DAP4](https://opendap.github.io/dap4-specification/DAP4.html)" protocol is used, rather than `https`, to access Cloud OPeNDAP-enabled granules. This allows for certain granules to have their variables organized into group hierarchies, complex variable names retrieved, and to further distinguish dataset variables between each other. Because of this difference over on-premises OPeNDAP, which used DAP2, certain programming libraries may require updated methods for accessing Cloud OPeNDAP-enabled granules.

Before the first granule in the list is accessed, we first replace the URL protocol in the string, then we create an authentication session with `Pydap`. 

In [None]:
# Set the variable and region you want to extract
region = "hi"
variable = "tmax"

# Initialize list of valid OPeNDAP URLs
opendap_urls = []

# Parse each result for valid OPeNDAP URLs
for item in results:
    for urls in item['umm']['RelatedUrls']:
        if 'OPENDAP' in urls.get('Description', '').upper():
            # Get base URL
            url = urls['URL'].replace('https', 'dap4')

            # Filter: only include URLs that contain the desired variable in the filename
            if region + '_' + variable in url:
                # Subset only the desired variable, lat, lon, time
                ce = "?dap4.ce=/{}%3B/lat%3B/lon%3B/time".format(variable)
                url += ce

                # Add to list
                opendap_urls.append(url)

# Use netrc file to authenticate
my_session = create_session()

## 5. Open and Subset Granules Using Xarray

Xarray is a commonly-used and widely supported Python library used for accessing and analyzing remotely-hosted datasets. Below, we use the <code>open_dataset()</code> function to access our first Cloud OPeNDAP Daymet granule, and to view its metadata.

**NOTE:** Occasionally, due to server load, "BES Connection" errors may appear while opening a Cloud OPeNDAP granule. These errors do not affect granule access. 

In [None]:
# Send DAP4 request, open as an xarray dataset
ds = xr.open_mfdataset(opendap_urls, engine="pydap", session=my_session)
ds

## 6. Resample and Plot tmax

Below, we will resample the tmax variable and calculate the monthly mean using Xarray's built-in functions. Then, we will plot the monthly tmax mean for the month of July in Hawaii for 2010.

In [None]:
# Monthly resample
monthly_tmax_mean = ds['tmax'].resample(time="ME").mean()
monthly_tmax_mean

In [None]:
monthly_tmax_mean[6, :, :].plot.pcolormesh(
    x='lon',
    y='lat',
    cmap='coolwarm',
    shading='auto'
)