## Using OPeNDAP to access data remotely: MUR example

One of our researcher asked me recently to download the MUR (Multi-scale Ultra-high Resolution SST) dataset. She is interested in all the available period but only for a small region. This dataset is relatively small but has many files (several for each day across 19 years) and it is updated frequently. This means that we would also have to update and check the dataset frequently and the files would be stored across several sub-directories making the access more complicated. <br><br>
Fortunately this data is available via OPeNDAP. OPeNDAP is a web-based software that allows users to access datasets remotely. Many softwares used for analysis recognise an OPeNDAP url as a filename. A OPeNDAP url is usually constituted by the remote address of the file followed by optional constraints.<br><br>
This is one of the advantages of OPeNDAP you don't need to download a file before using it, you can simply subset the portion you need and the software you are using will load only the data you need. Next time you run the same analysis, if the data has been updated, you will be automatically using the updated dataset.<br><br>


### OPeNDAP url
Let's check an example using a test server:
http://test.opendap.org:80/opendap/data/nc/sst.mnmean.nc.gz
http://test.opendap.org:80/opendap/data/nc/sst.mnmean.nc.gz?sst[10:2:18][10:1:28][100:1:120],time,lon,lat
If you copy and paste the above url in your browser you will see what an OPeNDAP form looks like.
Let's split this url:<br>
test.opendap.org:80/opendap/data
is the root of the opendap catalogue, starting from this url you can browse down the available subdirectories, in our case "nc" indicating netcdf files;<br>
finally the filename
sst.mnmean.nc.gz<br>
Note in this example the file is compressed with gzip, opendap can access compressed files without needing you to download and uncompress them before.
If you want to select only some variables you can do so by adding some constraints:<br>
?sst,lat <br>
The constraints syntax is a question mark followed by a list of variables.
Each variable is separated by a comma and can be indexed, for example
?sst[10:2:18][10:1:28][100:1:120] will return a subset of the sst array with:
 * only every 2 timesteps from index 10 to 18
 * lat from index 10 to 28 included
 * lon from index 100 to 120 included

You don't need to subset a variable or even specify any of them. It is useful when you want to select only a region or a time range. <br>
The downside is that usually you have to first retrieve the dimensions to work out which indexes to use.<br>
We will see now how using xarray and python can help you skip this step.<br>

### Accessing OPeNDAP in python with xarray

I am using xarray to open the file, load the data and select the time and lat/lon ranges. <br>
siphon is used to work out a list of all the files from the opendap catalogue.<br>

In [1]:
import xarray as xa
#import siphon

ModuleNotFoundError: No module named 'xarray'

Opening one file and selecting the region

In [10]:
file="https://podaac-opendap.jpl.nasa.gov/opendap/allData/ghrsst/data/GDS2/L4/GLOB/JPL/MUR/v4.1/2002/152/20020601090000-JPL-L4_GHRSST-SSTfnd-MUR-GLOB-v02.0-fv04.1.nc"

If I know exactly which indexes I'm interested into i could add a constraints to the url above and get back only a subset of the dataset.<br>
Since we are using xarray which doesn't load the data until you tell him to do so we don't even have to worry about that.

In [11]:
data = xa.open_dataset(file)

I can select the sst variable and a specific region using latitude and longitude values simply as I would after loading data from a netcdf file. In fact xarray showed me the variable names and dimensions after I "connected" to the remote virtual file. 

In [12]:
sst=data['analysed_sst'].sel(lat=slice(-53.99,-14), lon=slice(140,170))
sst

### Aggregated virtual files

Another powerful feature of OPeNDAP is that it works also with virtually aggregated datasets. This sounds complicated but all you need to know is that a multi files dataset can be made visible as a single file, you can then access potentially thousands of files via a single url. <br><br>
MUR dataset is available as a virtually aggregated file so we can use this version of the data to get the complete SST timeseries from one url only.

In [54]:
aggr_url = "https://thredds.jpl.nasa.gov/thredds/dodsC/OceanTemperature/MUR-JPL-L4-GLOB-v4.1.nc"
data = xa.open_dataset(aggr_url)

I loaded the data in the same way and I'm going to select sst and the region IO'm interested into in exactly the same way

In [56]:
sst=data['analysed_sst'].sel(lat=slice(-53.99,-14), lon=slice(140,170))
sst

## Using siphon to find out all the available files for the years 2002 to 2018

In [14]:
dap_url = "https://podaac-opendap.jpl.nasa.gov/opendap/allData/ghrsst/data/GDS2/L4/GLOB/JPL/MUR/v4.1/"
aggr_url = "https://thredds.jpl.nasa.gov/thredds/catalog_ghrsst_gds2.html?dataset=MUR-JPL-L4-GLOB-v4.1"

In [15]:
from siphon.catalog import TDSCatalog

In [30]:
root_url = "https://thredds.jpl.nasa.gov/thredds/catalog.xml"
tds_url="https://thredds.jpl.nasa.gov/thredds/catalog_ghrsst_gds2.xml?dataset=MUR-JPL-L4-GLOB-v4.1"

In [32]:
cat = TDSCatalog(tds_url)
print(dir(cat))

['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', '_process_catalog_ref', '_process_dataset', '_process_datasets', '_process_metadata', 'base_tds_url', 'catalog_name', 'catalog_refs', 'catalog_url', 'datasets', 'ds_with_access_elements_to_process', 'latest', 'metadata', 'services']


In [48]:
print(cat.datasets)
for s in cat.services:
    print(s.name,s.service_type)
print(dir(cat.datasets[0])) 
mur_url = cat.datasets[0].url_path

['MUR-JPL-L4-GLOB-v4.1 Aggregation']
all Compound
all_With_http Compound
['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', '_resolved', '_resolverUrl', 'access_element_info', 'access_urls', 'access_with_service', 'add_access_element_info', 'catalog_name', 'download', 'make_access_urls', 'name', 'ncssServiceNames', 'remote_access', 'remote_open', 'resolve_url', 'subset', 'url_path']


OceanTemperature/MUR-JPL-L4-GLOB-v4.1.nc
