<span style='color:#009999'> <span style='font-family:serif'> <font size="15"> **Accessing PACE data on NASA's OPeNDAP on-prem server**<span style='color:#0066cc'> 

<img src="img/PACE.png" alt="drawing" width="750"/>    


<span style='color:#ff6666'><font size="5">**Requirements**
1. <font size="3"><span style='color:Black'> Have a Bearer Token for EarthData in the Cloud (See `GetStarted` Notebook).
2. <font size="3"><span style='color:Black'> Upload the Bearer Token from local file `token.json`


<font size="3"><span style='color:Black'> For completion, this notebook acessess data from PACE via OPeNDAP on-premisses server. The workflow is identical to accessing data on Hyrax in the Cloud.


<span style='color:#ff6666'><font size="5"> **Objectives**
 
 
- <font size="3"><span style='color:Black'> Demostrate how to use NASA's `Common Metadata Repository` ([CMR](https://cmr.earthdata.nasa.gov/search)) to find `OPeNDAP URLS` associated with a collection.
- <font size="3"><span style='color:Black'> Demonstrate the use of `Constraint Expressions` to reduce metadata during Virtual Dataset creation
- <font size="3"><span style='color:Black'> Use `pydap`'s `consolidate_metadata` to accelerate data cube creation via `xarray.open_mfdataset`.
- <font size="3"><span style='color:Black'> Demonstrate an advanced workflow for remote data access and plotting of **Level 3** PACE data concerning surface `chlorophyll a`.


<span style='color:#ff6666'><font size="5">**Browsing Data**:

<font size="3"><span style='color:Black'> We are interested in PACE OCI data with **doi**: `10.5067/PACE/OCI/L3M/CHL/3.0`.

<font size="3"><span style='color:Black'> The **doi** can be found using Earthdata search.

<font size="3"><span style='color:Black'> For more information about PACE, head to https://pace.oceansciences.org/ 

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import requests
from pydap.client import open_url
from pydap.net import create_session
import json
import cartopy.crs as ccrs
import xarray as xr
import datetime as dt
from pydap.client import consolidate_metadata

<span style='color:#ff6666'><font size="5">**Finding Cloud OPeNDAP URLs with NASA's CMR**:

<span style='font-family:serif'> <font size="3"><span style='color:Black'> Below we illustrate how to find OPeNDAP URLs via the **CMR**

<span style='color:#0066cc'><font size="3.5"> **To find (on-prem) OPeNDAP URLs you will need:**

* One of `Collection Concept ID` or `dataset DOI`
* Time Range

<span style='font-family:serif'> <font size="3"><span style='color:Black'>  On-prem OPeNDAP URLs look distinct to cloud OPeNDAP URLs. However, the workflow for finding OPeNDAP URLs and accessing OPeNDAP-served data remains identical. 




In [None]:
session = requests.Session()

In [None]:
# CMR API base url
cmrurl='https://cmr.earthdata.nasa.gov/search/'
doi = "10.5067/PACE/OCI/L3M/CHL/3.0"
doisearch = cmrurl + 'collections.json?doi=' + doi
print(doisearch)

concept_id = session.get(doisearch).json()['feed']['entry'][0]['id']
print(concept_id)

<span style='font-family:serif'> <font size="5.5"><span style='color:#0066cc'> **Specify time range**

<font size="3"><span style='color:Black'> This dataset covers `March 2024` to present day. 


In [None]:
start_date =  dt.datetime(2024, 4, 1) 
end_date = dt.datetime(2024, 12, 31)

print(start_date, end_date,sep='\n')

dt_format = '%Y-%m-%dT%H:%M:%SZ' # format requirement for datetime search
temporal_str = start_date.strftime(dt_format) + ',' + end_date.strftime(dt_format)
print(temporal_str)

<span style='font-family:serif'> <font size="5.5"><span style='color:#0066cc'> **Get all available OPeNDAP URLs via CMR**

The cell below will search/find all OPeNDAP URLs associated with the Collection concept ID.

The results wll be stored in the variable `granules_urls`.
    

In [None]:
def get_opendap_urls(concept_id, time_range, _session=None):
    """
    Queries NASA's `Common Metadata Repository` to identify all OPeNDAP URLS
    given collection concept ID and temporal time range.
    """
    cmr_url = 'https://cmr.earthdata.nasa.gov/search/granules'
    if not _session:
        _session = requests.Session() 
    cmr_response = _session.get(cmr_url, params={'concept_id': concept_id,'temporal': time_range,'page_size': 500}, headers={'Accept': 'application/json'})
    granules = cmr_response.json()['feed']['entry']
    granules_urls = []
    
    # Filter and only retain the OPeNDAP URLs
    for granule in granules:
        item = next((item['href'] for item in granule['links'] if "opendap" in item["href"]), None)
        if item != None:
            granules_urls.append(item)
    return granules_urls

In [None]:
%%time
granules_urls = get_opendap_urls(concept_id, temporal_str)

In [None]:
print("We found: ", len(granules_urls), " total Non-Cloud OPeNDAP URLS associated with this collection! However not all these belong to the same datacube. WE need to further filter these")

In [None]:
granules_urls[:10]

In [None]:
new_urls = [url.replace("https", "dap4") for url in granules_urls if '4km' in url and "DAY" in url]
print("Of the 500 OPeNDAP URLs in the Collection, only ", len(new_urls), " are associated with the correct data cube. ")

In [None]:
new_urls[:10]

### Recover locally stored token for authentication

In [None]:
# load token json data
with open('token.json', 'r') as fp:
    token = json.load(fp)

# pass Token Authorization to a new Session.
my_session = create_session(use_cache=True, session_kwargs=token)
# clear just in case
my_session.cache.clear()

## Consolidate all URL Metadata Associated with the Data URL of cloud OPeNDAP URLs

PyDAP allows to construct a (cached) reference to all Cloud OPeNDAP urls, and can persist through sessions. Meaning, these Cloud OPenDAP URLS can be stored in your machine
for later use!!!! 




In [None]:
%%time
consolidate_metadata(new_urls, my_session)

In [None]:
my_session.cache.urls()[:10]

## Create a datacube with xarray and pydap as an engine!




In [None]:
%%time
ds = xr.open_mfdataset(new_urls, engine='pydap', session=my_session, parallel=True, combine='nested', concat_dim='time')

In [None]:
ds

In [None]:
chlor_a = ds['chlor_a'].isel(time=0)
chlor_a

In [None]:
%%time
plt.figure(figsize=(25, 8))
ax = plt.axes(projection=ccrs.PlateCarree())
ax.set_global()
ax.coastlines()
plt.contourf(ds.lon, ds.lat, np.log(chlor_a), 400, cmap='nipy_spectral')
plt.colorbar().set_label(chlor_a.attrs['long_name'] + ' ['+chlor_a.attrs['units']+']')
plt.show()