<span style='color:#009999'> <span style='font-family:serif'> <font size="15"> **Accessing PACE data on NASA's OPeNDAP on-prem server**<span style='color:#0066cc'> 

<img src="img/PACE.png" alt="drawing" width="750"/>    


<span style='color:#ff6666'><font size="5">**Requirements**
1. <font size="3"><span style='color:Black'> Have a Bearer Token for EarthData in the Cloud (See `GetStarted` Notebook).
2. <font size="3"><span style='color:Black'> Upload the Bearer Token from local file `token.json`


<font size="3"><span style='color:Black'> For completion, this notebook acessess data from PACE via OPeNDAP on-premisses server. The workflow is identical to accessing data on Hyrax in the Cloud.


<span style='color:#ff6666'><font size="5"> **Objectives**
 
 
- <font size="3"><span style='color:Black'> Demostrate how to use NASA's `Common Metadata Repository` ([CMR](https://cmr.earthdata.nasa.gov/search)) to find `OPeNDAP URLS` associated with a collection.
- <font size="3"><span style='color:Black'> Demonstrate the use of `Constraint Expressions` to reduce metadata during Virtual Dataset creation
- <font size="3"><span style='color:Black'> Use `pydap`'s `consolidate_metadata` to accelerate data cube creation via `xarray.open_mfdataset`.
- <font size="3"><span style='color:Black'> Demonstrate an advanced workflow for remote data access and plotting of **Level 3** PACE data concerning surface `chlorpphyll a`.


<span style='color:#ff6666'><font size="5">**Browsing Data**:

<font size="3"><span style='color:Black'> Will make use of 

In [1]:
import matplotlib.pyplot as plt
import numpy as np
import requests
from pydap.client import open_url
from pydap.net import create_session
import json
import cartopy.crs as ccrs
import xarray as xr
import datetime as dt
from pydap.client import consolidate_metadata

<span style='color:#ff6666'><font size="5">**Finding Cloud OPeNDAP URLs with NASA's CMR**:

<span style='font-family:serif'> <font size="3"><span style='color:Black'> Below we illustrate how to find OPeNDAP URLs via the **CMR**

<span style='color:#0066cc'><font size="3.5"> **To find (on-prem) OPeNDAP URLs you will need:**

* One of `Collection Concept ID` or `dataset DOI`
* Time Range

<span style='font-family:serif'> <font size="3"><span style='color:Black'>  On-prem OPeNDAP URLs look distinct to cloud OPeNDAP URLs. However, the workflow for finding OPeNDAP URLs and accessing OPeNDAP-served data remains identical. 




In [2]:
session = requests.Session()

In [3]:
# CMR API base url
cmrurl='https://cmr.earthdata.nasa.gov/search/'
doi = "10.5067/PACE/OCI/L3M/CHL/3.0"
doisearch = cmrurl + 'collections.json?doi=' + doi
print(doisearch)

concept_id = session.get(doisearch).json()['feed']['entry'][0]['id']
print(concept_id)

https://cmr.earthdata.nasa.gov/search/collections.json?doi=10.5067/PACE/OCI/L3M/CHL/3.0
C3385050568-OB_CLOUD


<span style='font-family:serif'> <font size="5.5"><span style='color:#0066cc'> **Specify time range**

<font size="3"><span style='color:Black'> This dataset covers `03-05-2024` to present day. 


In [4]:
start_date =  dt.datetime(2024, 4, 1)
end_date = dt.datetime(2024, 12, 31)

print(start_date, end_date,sep='\n')

dt_format = '%Y-%m-%dT%H:%M:%SZ' # format requirement for datetime search
temporal_str = start_date.strftime(dt_format) + ',' + end_date.strftime(dt_format)
print(temporal_str)

2024-04-01 00:00:00
2024-12-31 00:00:00
2024-04-01T00:00:00Z,2024-12-31T00:00:00Z


<span style='font-family:serif'> <font size="5.5"><span style='color:#0066cc'> **Get all available OPeNDAP URLs via CMR**

The cell below will search/find all OPeNDAP URLs associated with the Collection concept ID.

The results wll be stored in the variable `granules_urls`.
    

In [5]:
%%time
cmr_url = 'https://cmr.earthdata.nasa.gov/search/granules'

cmr_response = session.get(cmr_url, 
                            params={
                                'concept_id': concept_id,
                                'temporal': temporal_str,
                                'page_size': 500,
                                },
                            headers={
                                'Accept': 'application/json'
                                }
                            )

granules = cmr_response.json()['feed']['entry']

granules_urls = []

for granule in granules:
    item = next((item['href'] for item in granule['links'] if "opendap" in item["href"]), None)
    if item != None:
        granules_urls.append(item)

CPU times: user 22.5 ms, sys: 3.73 ms, total: 26.2 ms
Wall time: 826 ms


In [6]:
print("We found: ", len(granules_urls), " total  OPeNDAP URLS associated with this collection! However not all these belong to the same datacube. WE need to further filter these")

We found:  500  total  OPeNDAP URLS associated with this collection! However not all these belong to the same datacube. WE need to further filter these


In [7]:
granules_urls[:5]

['https://oceandata.sci.gsfc.nasa.gov/opendap/PACE_OCI/L3SMI/2024/0329/PACE_OCI.20240329_20240405.L3m.8D.CHL.V3_0.chlor_a.0p1deg.nc',
 'https://oceandata.sci.gsfc.nasa.gov/opendap/PACE_OCI/L3SMI/2024/0329/PACE_OCI.20240329_20240405.L3m.8D.CHL.V3_0.chlor_a.4km.nc',
 'https://oceandata.sci.gsfc.nasa.gov/opendap/PACE_OCI/L3SMI/2024/0401/PACE_OCI.20240401.L3m.DAY.CHL.V3_0.chlor_a.0p1deg.nc',
 'https://oceandata.sci.gsfc.nasa.gov/opendap/PACE_OCI/L3SMI/2024/0401/PACE_OCI.20240401.L3m.DAY.CHL.V3_0.chlor_a.4km.nc',
 'https://oceandata.sci.gsfc.nasa.gov/opendap/PACE_OCI/L3SMI/2024/0401/PACE_OCI.20240401_20240430.L3m.MO.CHL.V3_0.chlor_a.0p1deg.nc']

In [8]:
new_urls = [url.replace("https", "dap4") for url in granules_urls if '4km' in url and "DAY" in url]
len(new_urls)

214

In [9]:
# load token json data
with open('token.json', 'r') as fp:
    token = json.load(fp)


# pass Token Authorization to a new Session.
my_session = create_session(use_cache=True, session_kwargs=token)
# clear just in case
my_session.cache.clear()

## Consolidate all URL Metadata Associated with the Data URL of cloud OPeNDAP URLs

PyDAP allows you to construct a (cached) reference to all Cloud OPeNDAP urls that is able to persist. Meaning, these cloud OPenDPA uRLS can be stored in your machine
for later use!!!! 




In [10]:
%%time
consolidate_metadata(new_urls, my_session)

datacube has dimensions {'lat[0:1:4319]', 'rgb[0:1:2]', 'eightbitcolor[0:1:255]', 'lon[0:1:8639]'}
CPU times: user 1.68 s, sys: 327 ms, total: 2.01 s
Wall time: 3.16 s


## Create a datacube with xarray and pydap as an engine!




In [11]:
%%time
ds = xr.open_mfdataset(new_urls, engine='pydap', session=my_session, parallel=True, combine='nested', concat_dim='time')

CPU times: user 1.39 s, sys: 382 ms, total: 1.77 s
Wall time: 1.51 s


In [12]:
ds

Unnamed: 0,Array,Chunk
Bytes,29.76 GiB,142.38 MiB
Shape,"(214, 4320, 8640)","(1, 4320, 8640)"
Dask graph,214 chunks in 643 graph layers,214 chunks in 643 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 29.76 GiB 142.38 MiB Shape (214, 4320, 8640) (1, 4320, 8640) Dask graph 214 chunks in 643 graph layers Data type float32 numpy.ndarray",8640  4320  214,

Unnamed: 0,Array,Chunk
Bytes,29.76 GiB,142.38 MiB
Shape,"(214, 4320, 8640)","(1, 4320, 8640)"
Dask graph,214 chunks in 643 graph layers,214 chunks in 643 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,160.50 kiB,768 B
Shape,"(214, 3, 256)","(1, 3, 256)"
Dask graph,214 chunks in 643 graph layers,214 chunks in 643 graph layers
Data type,uint8 numpy.ndarray,uint8 numpy.ndarray
"Array Chunk Bytes 160.50 kiB 768 B Shape (214, 3, 256) (1, 3, 256) Dask graph 214 chunks in 643 graph layers Data type uint8 numpy.ndarray",256  3  214,

Unnamed: 0,Array,Chunk
Bytes,160.50 kiB,768 B
Shape,"(214, 3, 256)","(1, 3, 256)"
Dask graph,214 chunks in 643 graph layers,214 chunks in 643 graph layers
Data type,uint8 numpy.ndarray,uint8 numpy.ndarray
