# Using PyDAP for simple remote data access using DAP4

This notebook contains a simple example of how to use PyDAP and DAP4 to pull
data from remote DAP4 servers into your Python program's memory space
so that the data may be analyized and displayed.

We show how to use simple HTTP requests to look at a dataset's metadata 
and how to use the PyDAP package to read data into numpy arrays and plot 
(or do other things) with those data. PyDAP provides lazy evaluation, so 
data are read only when needed.

This tutorial utilizes the NASA Global High Resolution Sea Surface Temperature from the GOES-16 satellite. You may wish to review the summary on our [Tutorial Datasets page](https://opendap.github.io/documentation/tutorials/TutorialDatasets.html#_nasa_global_high_resolution_sea_surface_temperature_goes_16_satellite) before continuing.

You can run this tutorial in your browser using Colab.<br>

<a target="_blank" href="https://colab.research.google.com/github/OPENDAP/NASA-tutorials/blob/main/tutorials/colab_backup/3.pydap_dap4_basic.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>


<br>

Copyright (C) 2023 OPeNDAP, Inc.
This Jupyter Notebook is made available under the [Creative Commons Attribution license 4.0](https://creativecommons.org/licenses/by/4.0/).

In [None]:
# Clone into the repository
! git clone https://github.com/OPENDAP/NASA-tutorials.git
# Use pip3 to install netCDF4 until conda has a version that authenticates with EDL.
! pip3 install earthaccess pydap
# Create dodsrc files
! ./NASA-tutorials/tutorials/setup_dodsrc.sh

In [12]:
# The requests package provides a high-level interface to HTTP/S.
# Install using 'conda install requests'
import requests

# PyDAP is an alternative to using the NetCDF library to read data.
from pydap.client import open_url
from pydap.cas.urs import setup_session

import numpy as np

# The Code package makes for a nice display of information
from IPython.display import Code

## dataset_url

The dataset_url is the DAP2/DAP4 service endpoint for the dataset.

The original publisher of this data is NASA, and NASA requires that all users
authenticate in order to access data. Setting up authentication takes additional
steps. If you don't wish to configure the authentication, this tutorial may
also be used with a copy of the data hosted on test.opendap.org, without a
requirement of authenticated access.

In order to use the authenticated access to NASA's data:
1. See the notebook NASA EDL login or [Configure your client (in this case PyDAP) to authenticate with the appropriate Earthdata Login (EDL) service](https://opendap.github.io/documentation/tutorials/ClientAuthentication.html#_pydap)
1. Set `USE_ORIGINAL_SERVICE=True` in the following code block, before you run it.

### Login to Earthdata Login using your username and password



In [13]:
import earthaccess
auth = earthaccess.login(strategy="interactive", persist=True)

Are you authenticated... yes.


In [14]:
USE_ORIGINAL_SERVICE=True

if(USE_ORIGINAL_SERVICE):
    dataset_url = "dap4://opendap.earthdata.nasa.gov/collections/C2036877806-POCLOUD/granules/20220812010000-OSISAF-L3C_GHRSST-SSTsubskin-GOES16-ssteqc_goes16_20220812_010000-v02.0-fv01.0"
else:
    dataset_url = "dap4://test.opendap.org/opendap/tutorials/20220812010000-OSISAF-L3C_GHRSST-SSTsubskin-GOES16-ssteqc_goes16_20220812_010000-v02.0-fv01.0.nc"

print ("dataset_url: ", dataset_url)


dataset_url:  dap4://opendap.earthdata.nasa.gov/collections/C2036877806-POCLOUD/granules/20220812010000-OSISAF-L3C_GHRSST-SSTsubskin-GOES16-ssteqc_goes16_20220812_010000-v02.0-fv01.0


PyDAP, like the NetCDF libraries, uses a protocol convention so that it
may easily distinguish between DAP2 protocol URLs (_http://_ & _https://_) and
DAP4 protocol URLs (_dap4://_) This means that in order for the DAP4 URL that
PyDAP is expecting we need to replace the current http(s) protocol with the DAP4
protocol.

In [15]:
if dataset_url.find("https") == 0 :
    dap4_url=dataset_url.replace("https://","dap4://",1)
else :
    dap4_url=dataset_url.replace("http://","dap4://",1)

print ("dap4_url: ", dap4_url)


dap4_url:  dap4://opendap.earthdata.nasa.gov/collections/C2036877806-POCLOUD/granules/20220812010000-OSISAF-L3C_GHRSST-SSTsubskin-GOES16-ssteqc_goes16_20220812_010000-v02.0-fv01.0


### Opening the dataset URL to build a dataset
In PyDAP, we use the PyDAP Client to open a remote dataset and retrun it's
associated Dataset object. When we call ```pydap.client.open_url()```, the client
downloads the DMR (the dap4 metadata response), parses it, and then builds a
PyDAP dataset object from it. In the process interpreting the:
- data types including endianess
- shapes
- hierarchy (groups)
- relations (maps) of variables and dimensions
- variable attributes

No data values are downloaded at this point; rather, 'DummyData' of the
appropriate type and shape are inserted into the dataset along with the
metadata.

To utilize the DAP4 protocol one of two things maybe done. Either:
- Use the DAP4 protocol scheme, 'dap4://', in the url (canonical)
- Specifing the 'protocol scheme' kwarg in the function call:
  ```pydap.client.open_url(url, protocol='dap4', **kwargs)```.



In [17]:
# You can use either one of these... The first version will force the use of the DAP4 protocol.
# The second will use DAP2 if the URL starts with 'http...' and DAP if it starts with 'dap4...'

# pydap_ds = pydap.client.open_url(dataset_url, protocol="dap4")

pydap_ds = open_url(dap4_url)

pydap_ds._dict


HTTPError: 502 Bad Gateway
502 Bad Gateway

Bad gateway.

 Connection refused  

## Using PyDAP to read values

In [11]:
print(pydap_ds)

NameError: name 'pydap_ds' is not defined

In [22]:
print("Domain Coordinates")
print("time", pydap_ds.time.shape)
print("lat", pydap_ds.lat.shape)
print("lon", pydap_ds.lon.shape,"\n")

print("Range Variables")
print("sea_surface_temperature", pydap_ds.sea_surface_temperature.shape)
print("wind_speed", pydap_ds.wind_speed.shape)


Domain Coordinates
time (1,)
lat (2400,)
lon (2400,) 

Range Variables
sea_surface_temperature (1, 2400, 2400)
wind_speed (1, 2400, 2400)


### PyDAP only reads the data when  needed
The above calls to determine the variables shapes, just like the **open_url()**
call, do not retrieve data values. When the code *accesses* data
values in the cells below the data transferred.

This assignment:
```
    time = pydap_ds.time.data[:]
```
causes data values to be readread. Note that only time data are being retrieved
in this statement.

In [23]:
# Get and inspect the domain coordinate variable "time"
time = pydap_ds.time.data[:]
print("time",time.shape,time,"\n")

time (1,) [1313110800] 



In [24]:
# Get and inspect the domain coordinate variable "lat"
lat = pydap_ds.lat.data[:]
print("lat",lat.shape,": ",lat,"\n")

lat (2400,) :  [-59.975 -59.925 -59.875 ...  59.875  59.925  59.975] 



In [25]:
# Get and inspect the domain coordinate variable "lon"
lon = pydap_ds.lon.data[:]
print("lon", lon.shape, "\n")
print( lon, "\n")

lon (2400,) 

[-134.975 -134.925 -134.875 ...  -15.125  -15.075  -15.025] 



In [26]:
%%time

# Get and inspect the range coordinate variable "sea_surface_temperature"
sst=pydap_ds.sea_surface_temperature.data[:]
print(sst)

[[[-32768 -32768 -32768 ... -32768 -32768 -32768]
  [-32768 -32768 -32768 ... -32768 -32768 -32768]
  [-32768 -32768 -32768 ... -32768 -32768 -32768]
  ...
  [-32768 -32768 -32768 ... -32768 -32768 -32768]
  [-32768 -32768 -32768 ...   1513   1513   1513]
  [-32768 -32768 -32768 ...   1345   1345 -32768]]]
CPU times: user 568 ms, sys: 348 ms, total: 916 ms
Wall time: 6.18 s


In [27]:
#
# Here we drop the time dimension
# (which is now has size 1) using numpy.squeeze()

print("Before Squeeze, sst.shape",sst.shape)
sst = np.squeeze(sst)
print("After Squeeze, sst.shape",sst.shape)

#
# Convert to Float32 so that we can meaningfull fill values for BaseMap
sst = sst[:].astype(np.float32)

# Create FillValues
sst[sst < -32000] = np.nan



Before Squeeze, sst.shape (1, 2400, 2400)
After Squeeze, sst.shape (2400, 2400)


In [28]:
# looking at the SST data, it's obvious they need to be scaled.
print(sst)

# The attributes show the values for 'm' and 'b' in 'y = mx + b'
pydap_ds.sea_surface_temperature.attributes

[[  nan   nan   nan ...   nan   nan   nan]
 [  nan   nan   nan ...   nan   nan   nan]
 [  nan   nan   nan ...   nan   nan   nan]
 ...
 [  nan   nan   nan ...   nan   nan   nan]
 [  nan   nan   nan ... 1513. 1513. 1513.]
 [  nan   nan   nan ... 1345. 1345.   nan]]


{'_FillValue': '-32768',
 'long_name': 'sea surface subskin temperature',
 'standard_name': 'sea_surface_subskin_temperature',
 'units': 'kelvin',
 'add_offset': '273.14999999999998',
 'scale_factor': '0.01',
 'valid_min': '-300',
 'valid_max': '4500',
 'depth': '1 millimeter',
 'source': 'GOES_Imager',
 'comment': 'Temperature of the subskin of the ocean'}

In [29]:
# Scale the SST values to get degrees Kelvin

scaled_sst = sst * 0.01 + 273.14999999999998
print(scaled_sst)

[[   nan    nan    nan ...    nan    nan    nan]
 [   nan    nan    nan ...    nan    nan    nan]
 [   nan    nan    nan ...    nan    nan    nan]
 ...
 [   nan    nan    nan ...    nan    nan    nan]
 [   nan    nan    nan ... 288.28 288.28 288.28]
 [   nan    nan    nan ... 286.6  286.6     nan]]


### Create Simple Plot

In [None]:
import matplotlib.pyplot as plt

plt.pcolormesh(lat, lon, scaled_sst)