# Using PyDAP for simple remote data access
This notebook contains a simple example of how to use PyDAP to pull
data from remote DAP servers into your Python programs memory space
so that the data may be analyized and displayed.

We show how to use simple HTTP requests to look at a dataset's metadata
and how to use the PyDAP package to read data into numpy arrays and plot
(or do other things) with those data. PyDAP provides lazy evaluation, so
data are read only when needed.

This tutorial utilizes the NASA Global High Resolution Sea Surface Temperature from the GOES-16 satellite. You may wish to review the summary on our [Tutorial Datasets page](https://opendap.github.io/documentation/tutorials/TutorialDatasets.html#_nasa_global_high_resolution_sea_surface_temperature_goes_16_satellite) before continuing.

You can run this tutorial in your browser using Colab.<br>

<a target="_blank" href="https://colab.research.google.com/github/OPENDAP/NASA-tutorials/blob/main/tutorials/colab_backup/4.pydap_dap2_basic.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

<br>

Copyright (C) 2023 OPeNDAP, Inc.
This Jupyter Notebook is made available under the [Creative Commons Attribution license 4.0](https://creativecommons.org/licenses/by/4.0/).

In [None]:
# Clone into the repository
! git clone https://github.com/OPENDAP/NASA-tutorials.git
# Use pip3 to install netCDF4 until conda has a version that authenticates with EDL.
! pip3 install earthaccess pydap
# Create dodsrc files
! ./NASA-tutorials/tutorials/setup_dodsrc.sh

In [27]:
# The requests package provides a high-level interface to HTTP/S
import requests

from pydap.client import open_url
from pydap.cas.urs import setup_session

import numpy as np

from IPython.display import Code

### Authenticate with `earthaccess`

In [None]:
import earthaccess
auth = earthaccess.login(strategy="interactive", persist=True)

## dataset_url

The dataset_url is the DAP2/DAP4 service endpoint for the dataset.

This first URL points to a copy of the example dataset hosted at test.opendap.org

In [28]:
dataset_url="http://test.opendap.org/opendap/tutorials/20220812010000-OSISAF-L3C_GHRSST-SSTsubskin-GOES16-ssteqc_goes16_20220812_010000-v02.0-fv01.0.nc"

This second URL points to the DAP sevice endpoint for the test granule at the original data publishers site, in this case NASA. That means that to utilize this URL you will need to: 
* [Configure your client (in this cae PyDAP) to authenticate with the appropriate Earthdata Login (EDL) service](https://opendap.github.io/documentation/tutorials/ClientAuthentication.html#_pydap)
* Uncomment the next line that points to NASA's DAP service endpoint for the example dataset.

In [29]:
# dataset_url="https://opendap.earthdata.nasa.gov/collections/C2036877806-POCLOUD/granules/20220812010000-OSISAF-L3C_GHRSST-SSTsubskin-GOES16-ssteqc_goes16_20220812_010000-v02.0-fv01.0"


## Look at the metadata
Lets look at the dataset's contents, first by looking at the variables in the dataset. We can do this by appending the extension _dds_ to the dataset's URL. The _Data Access Protocol_ denotes the different kinds of responses from a dataset using extensions. The most important ones are:
- **dds** Get information about the variables
- **das** Get semantic (i.e., attributes) about the dataset and its variables
- **dods** Get binary data (for individual or groups of variables)
- **ascii** Get data as ASCII, really useful for looking at small parts or a dataset

In [30]:
http_response = requests.get(dataset_url + '.' + 'dds')

In [31]:
print(http_response.url)

http://test.opendap.org/opendap/tutorials/20220812010000-OSISAF-L3C_GHRSST-SSTsubskin-GOES16-ssteqc_goes16_20220812_010000-v02.0-fv01.0.nc.dds


In [32]:
Code(http_response.text, language='C')

From this we can see these data consist of four regularly gridded variables. 

Lets look at the dataset's attribute information. From this we can see the values used to denote 'missing' or filled values, units, etc.

In [33]:
http_response = requests.get(dataset_url + '.' + 'das')

In [34]:
Code(http_response.text, language='C')

## Using PyDAP to read values

In [35]:
pydap_ds = open_url(dataset_url)

In [36]:
print(pydap_ds)

<DatasetType with children 'time', 'lat', 'lon', 'sea_surface_temperature', 'sst_dtime', 'sses_bias', 'sses_standard_deviation', 'dt_analysis', 'wind_speed', 'sea_ice_fraction', 'aerosol_dynamic_indicator', 'adi_dtime_from_sst', 'sources_of_adi', 'l2p_flags', 'quality_level', 'satellite_zenith_angle', 'solar_zenith_angle', 'or_latitude', 'or_longitude'>


In [37]:
print("Domain Coordinates")
print("time", pydap_ds.time.shape)
print("lat", pydap_ds.lat.shape)
print("lon", pydap_ds.lon.shape,"\n")

print("Range Variables")
print("sea_surface_temperature", pydap_ds.sea_surface_temperature.shape)
print("wind_speed", pydap_ds.wind_speed.shape)


Domain Coordinates
time (1,)
lat (2400,)
lon (2400,) 

Range Variables
sea_surface_temperature (1, 2400, 2400)
wind_speed (1, 2400, 2400)


### PyDAP only reads the data when  needed
The above call, just like the **open_url()** call, does not get data. The *access* to values in the cells below trigger the data transfers.

This assignment triggers a data read. Note that the grid 'SST' is being subset here.
Only the ''sliced' data are read. This is in addition to the subsetting performed by the grid() function. PyDAP enables constraints to be built up in this way, until an action like assignment triggers a read operation. This feature is often known as _lazy evaluation_ because the action can be _defined_ in stages and is not run until the values are needed.

In [38]:
# Get and inspect the domain coordinate variable "time"
time = pydap_ds.time.data[:]
print("time",time.shape,time,"\n")

time (1,) [1313110800] 



In [39]:
# Get and inspect the domain coordinate variable "lat"
lat = pydap_ds.lat.data[:]
print("lat",lat.shape,": ",lat,"\n")

lat (2400,) :  [-59.975 -59.925 -59.875 ...  59.875  59.925  59.975] 



In [40]:
# Get and inspect the domain coordinate variable "lon"
lon = pydap_ds.lon.data[:]
print("lon", lon.shape, "\n")
print( lon, "\n")

lon (2400,) 

[-134.975 -134.925 -134.875 ...  -15.125  -15.075  -15.025] 



In [41]:
%%time

# Get and inspect the range coordinate variable "sea_surface_temperature"

sst=pydap_ds.sea_surface_temperature.array.data[:]
print(sst)

[[[-32768 -32768 -32768 ... -32768 -32768 -32768]
  [-32768 -32768 -32768 ... -32768 -32768 -32768]
  [-32768 -32768 -32768 ... -32768 -32768 -32768]
  ...
  [-32768 -32768 -32768 ... -32768 -32768 -32768]
  [-32768 -32768 -32768 ...   1513   1513   1513]
  [-32768 -32768 -32768 ...   1345   1345 -32768]]]
CPU times: user 142 ms, sys: 281 ms, total: 422 ms
Wall time: 8.9 s


In [42]:
#
# Here we drop the time dimension
# (which is now has size 1) using numpy.squeeze()

print("Before Squeeze, sst.shape",sst.shape)
sst = np.squeeze(sst)
print("After Squeeze, sst.shape",sst.shape)

#
# Convert to Float32 so that we can meaningfull fill values for BaseMap
sst = sst[:].astype(np.float32)

# Create FillValues
sst[sst < -32000] = np.nan

Before Squeeze, sst.shape (1, 2400, 2400)
After Squeeze, sst.shape (2400, 2400)


In [43]:
# looking at the SST data, it's obvious they need to be scaled.
print(sst)

# The attributes show the values for 'm' and 'b' in 'y = mx + b'
pydap_ds.sea_surface_temperature.attributes

[[  nan   nan   nan ...   nan   nan   nan]
 [  nan   nan   nan ...   nan   nan   nan]
 [  nan   nan   nan ...   nan   nan   nan]
 ...
 [  nan   nan   nan ...   nan   nan   nan]
 [  nan   nan   nan ... 1513. 1513. 1513.]
 [  nan   nan   nan ... 1345. 1345.   nan]]


{'_FillValue': -32768,
 'long_name': 'sea surface subskin temperature',
 'standard_name': 'sea_surface_subskin_temperature',
 'units': 'kelvin',
 'add_offset': 273.15,
 'scale_factor': 0.01,
 'valid_min': -300,
 'valid_max': 4500,
 'depth': '1 millimeter',
 'source': 'GOES_Imager',
 'comment': 'Temperature of the subskin of the ocean'}

In [44]:
# Scale the SST values to get degrees Kelvin

scaled_sst = sst * 0.01 + 273.14999999999998
print(scaled_sst)

[[   nan    nan    nan ...    nan    nan    nan]
 [   nan    nan    nan ...    nan    nan    nan]
 [   nan    nan    nan ...    nan    nan    nan]
 ...
 [   nan    nan    nan ...    nan    nan    nan]
 [   nan    nan    nan ... 288.28 288.28 288.28]
 [   nan    nan    nan ... 286.6  286.6     nan]]


### Create Simple Plot

In [None]:
import matplotlib.pyplot as plt

plt.pcolormesh(lat, lon, scaled_sst)