<a name="top"></a>
<div style="width:1000 px">

<div style="float:right; width:98 px; height:98px;">
<img src="https://raw.githubusercontent.com/Unidata/MetPy/master/metpy/plots/_static/unidata_150x150.png" alt="Unidata Logo" style="height: 98px;">
</div>

<h1>Siphon Overview</h1>
<h3>Unidata Python Workshop</h3>

<div style="clear:both"></div>
</div>

<hr style="height:2px;">

<div style="float:right; width:250 px"><img src="https://unidata.github.io/siphon/latest/_images/tds-logo.png" alt="TDS" style="height: 200px;"></div>

## Overview:

* **Teaching:** 15 minutes
* **Exercises:** 15 minutes

### Questions
1. What is a THREDDS Data Server (TDS)?
1. How can I use Siphon to access a TDS?

### Objectives
1. <a href="#threddsintro">Use siphon to access a THREDDS catalog</a>
1. <a href="#filtering">Find data within the catalog that we wish to access</a>
1. <a href="#dataaccess">Use siphon to perform remote data access</a>

<a name="threddsintro"></a>
## 1. What is THREDDS?

 * Server for providing remote access to datasets
 * Variety of services for accesing data:
   - HTTP Download
   - Web Mapping/Coverage Service (WMS/WCS)
   - OPeNDAP
   - NetCDF Subset Service
   - CDMRemote
 * Provides a more uniform way to access different types/formats of data

## THREDDS Demo
http://thredds.ucar.edu

### THREDDS Catalogs
- XML descriptions of data and metadata
- Access methods
- Easily handled with `siphon.catalog.TDSCatalog`

In [None]:
from siphon.catalog import TDSCatalog
top_cat = TDSCatalog('http://thredds-test.unidata.ucar.edu/thredds/catalog.xml')

That takes care of download the catalog, parsing the XML, and doing useful things. From here we can do things like look at all the catalog references...

In [None]:
print(top_cat.catalog_refs)

So we can see what's available at the top level. We can also extract exactly what we're looking for using the name of the item:

In [None]:
ref = top_cat.catalog_refs['Unidata case studies']
ref.href

Or we can just access by position:

In [None]:
ref = top_cat.catalog_refs[5]
ref.href

and then resolve that catalog reference to get a new catalog.

In [None]:
new_cat = ref.follow()

Often, it can be simpler to just start the catalog at the data collection of interest. For instance, we can manually go find the catalog for some satellite data of interest from http://thredds-test.unidata.ucar.edu:

In [None]:
cat = TDSCatalog('http://thredds-test.unidata.ucar.edu/thredds/catalog/casestudies/irma/goes16/CONUS/Channel02/20170905/catalog.xml')

From this catalog, we can look at the first 10 datasets available:

In [None]:
cat.datasets[:10]

<a href="#top">Top</a>
<hr style="height:2px;">

<a name="filtering"></a>
## 2. Filtering data

We *could* manually figure out what dataset we're looking for and generate that name (or index). Siphon provides some helpers to simplify this process, provided the names of the dataset follow a pattern with the timestamp in the name:

In [None]:
from datetime import datetime, timedelta
ds = cat.datasets.filter_time_nearest(datetime(2017, 9, 5, 18, 30))
ds

We can also find the list of datasets within a time range:

In [None]:
time = datetime(2017, 9, 5, 18, 30)
datasets = cat.datasets.filter_time_range(time, time + timedelta(hours=1))
print(datasets)

<div class="alert alert-success">
    <b>EXERCISE</b>:
     <ul>
      <li>Starting from http://thredds-test.unidata.ucar.edu/thredds/catalog/casestudies/irma/goes16/catalog.html, find the Mesoscale sector 1 imagery for Channel 2 for 6 September 2017. This is probably easiest using a browser, but you can also do this using Siphon's API.</li>
      <li>If you use a browser, grab the URL and create a TDSCatalog instance.</li>
      <li>Hurricane Irma reached peak intensity at 21Z on 6 September 2017. Using Siphon, find the data available in the catalog for an hour on either side of this time.</li>
    </ul>
</div>

In [None]:
# cat = TDSCatalog(...)

In [None]:
# %load solutions/goes_cat.py

<a href="#top">Top</a>
<hr style="height:2px;">

<a name="dataaccess"></a>
## 3. Accessing data

Accessing catalogs is only part of the story; Siphon is much more useful if you're trying to access/download datasets.

For instance, using our mesoscale data that we just retrieved:

In [None]:
# Same as before
cat = TDSCatalog('http://thredds-test.unidata.ucar.edu/thredds/catalog/'
                 'casestudies/irma/goes16/Mesoscale-1/Channel02/'
                 '20170906/catalog.xml')

# Just ask for the file nearest to the time of interest
ds = cat.datasets.filter_time_nearest(datetime(2017, 9, 6, 21))

We can ask Siphon to download the file locally:

In [None]:
ds.download('goes-data.nc4')

In [None]:
!ls -l *.nc4

Or better yet, get a file-like object that lets us `read` from the file as if it were local:

In [None]:
fobj = ds.remote_open()
data = fobj.read()
print(len(data))

This is handy if you have Python code to read a particular format.

It's also possible to get access to the file through services that provide netCDF4-like access, but for the remote file. This access allows downloading information only for variables of interest, or for (index-based) subsets of that data:

In [None]:
nc = ds.remote_access()

By default this uses CDMRemote (if available), but it's also possible to ask for OPeNDAP (using netCDF4-python).

In [None]:
print(list(nc.variables))

<a href="#top">Top</a>
<hr style="height:2px;">